I have some data in text form, taken from a webpage. It's quite lengthy but follows the form:\[code\]<p><span class="monthyear">Jan 2001</span><br><b>Foo text (2)</b></p><p><span class="monthyear">Nov 2006</span><br><b>Bar text (29)</b><br><b>More bar text (4)</b><br><b>Yet more bar text (102)</b></p><p><span class="monthyear">Apr 2004</span><br><b>Further foo text (1)</b><br><b>Combination foo and bar text (41)</b></p>\[/code\]I want to extract the relevant parts of this into a data frame, like so:\[code\] monthyear info n1 Jan 2001 Foo text 22 Nov 2006 Bar text 293 Nov 2006 More bar text 4\[/code\]...but I'm not sure how to do it. If I have the html in a character vector called text I can extract the monthyear data using a function from the stringr package:\[code\]monthyear <- str_extract_all(text[1],perl("(?<=\\\"monthyear\\\">).*?20[0-9]{2}"))\[/code\]and I could extract the info and n data in the same sort of way, but given that there are multiple info and n entries for each monthyear entry, I'm not sure how to combine them. Am I going about this all wrong?