Circumvent errors in loop function (used to extract data from Twitter)

gywewytifew · Jul 20, 2012

I created a loop function that extract tweets using the search api with a certain interval (lets say every 5 min.). This function does what it suppose to do: connect to twitter, extracts tweets that contain a certain keyword, and saves them in a csv file. However occasionally (2-3 times a day) the loop is stopped because of one of these two errors:

Error in htmlTreeParse(URL, useInternal = TRUE) : error in creating parser for http://search.twitter.com/search.atom?q=6.95322e-310tst&rpp=100&page=10
Error in UseMethod("xmlNamespaceDefinitions") : no applicable method for 'xmlNamespaceDefinitions' applied to an object ofclass "NULL"

I hope you can help me deal with these errors, by answering some of my questions:

What causes these errors to occur?
How can I adjust my code to avoid these errors?
How can I 'force' the loop to keep running if it experiences an error (e.g. by using the Try function)?

My function (based on several scripts found online) is as follows:\[code\] library(XML) # htmlTreeParse twitter.search <- "Keyword" QUERY <- URLencode(twitter.search) # Set time loop (in seconds) d_time = 300 number_of_times = 3000 for(i in 1:number_of_times){ tweets <- NULL tweet.count <- 0 page <- 1 read.more <- TRUE while (read.more) { # construct Twitter search URL URL <- paste('http://search.twitter.com/search.atom?q=',QUERY,'&rpp=100&page=', page, sep='') # fetch remote URL and parse XML <- htmlTreeParse(URL, useInternal=TRUE, error = function(...){}) # Extract list of "entry" nodes entry <- getNodeSet(XML, "//entry") read.more <- (length(entry) > 0) if (read.more) { for (i in 1:length(entry)) { subdoc <- xmlDoc(entry[]) # put entry in separate object to manipulate published <- unlist(xpathApply(subdoc, "//published", xmlValue)) published <- gsub("Z"," ", gsub("T"," ",published) ) # Convert from GMT to central time time.gmt <- as.POSIXct(published,"GMT") local.time <- format(time.gmt, tz="Europe/Amsterdam") title <- unlist(xpathApply(subdoc, "//title", xmlValue)) author <- unlist(xpathApply(subdoc, "//author/name", xmlValue)) tweet <- paste(local.time, " @", author, ": ", title, sep="") entry.frame <- data.frame(tweet, author, local.time, stringsAsFactors=FALSE) tweet.count <- tweet.count + 1 rownames(entry.frame) <- tweet.count tweets <- rbind(tweets, entry.frame) } page <- page + 1 read.more <- (page <= 15) # Seems to be 15 page limit } } names(tweets) # top 15 tweeters #sort(table(tweets$author),decreasing=TRUE)[1:15] write.table(tweets, file=paste("Twitts - ", format(Sys.time(), "%a %b %d %H_%M_%S %Y"), ".csv"), sep = ";") Sys.sleep(d_time) } # end if\[/code\]

Circumvent errors in loop function (used to extract data from Twitter)

gywewytifew

New Member