Creating a data frame from a character vector in R

simsgreen · Jan 23, 2013

I have some data in text form, taken from a webpage. It's quite lengthy but follows the form:\[code\]Jan 2001 Foo text (2)Nov 2006 Bar text (29) More bar text (4) Yet more bar text (102)Apr 2004 Further foo text (1) Combination foo and bar text (41)\[/code\]I want to extract the relevant parts of this into a data frame, like so:\[code\] monthyear info n1 Jan 2001 Foo text 22 Nov 2006 Bar text 293 Nov 2006 More bar text 4\[/code\]...but I'm not sure how to do it. If I have the html in a character vector called text I can extract the monthyear data using a function from the stringr package:\[code\]monthyear <- str_extract_all(text[1],perl("(?<=\\\"monthyear\\\">).*?20[0-9]{2}"))\[/code\]and I could extract the info and n data in the same sort of way, but given that there are multiple info and n entries for each monthyear entry, I'm not sure how to combine them. Am I going about this all wrong?

Creating a data frame from a character vector in R

simsgreen

New Member