Oralryclame
New Member
So my problem is that I have extracted a lot of forum posts into separate txt files which are now on my harddrive. Each file contains information I would like to extract, some of which I already have figured out how to extract. The information I need to extract is in the following form:Within the same "html block"1: (x) messages in this thread
2: Message is in reply to (some html code) A HREF="http://stackoverflow.com/questions/12332335/link" (some html code=In task 1 is simply need to extract x
In task 2 i need to extract the links to which the message is in reply toI have looked into the different tm and XML packages but have not been able to actually find out what to use. Any advice is appreciated.This is what one of the txt files looks like`\[code\]</TABLE> <BR> <BR> <FONT FACE="Verdana,Geneva,Helvetica" SIZE="-1" COLOR="#990000"><B> Message has 2 Replies: </B></FONT><BR> <TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR VALIGN=TOP BGCOLOR="#E0E0E0"><TD ALIGN=LEFT><A HREF="http://stackoverflow.com/dear-lego/?n=14"><IMG BORDER=5 HEIGHT=3 WIDTH=3 SRC="http://stackoverflow.com/news/x.gif"></A></TD><TD><FONT SIZE="-2"> </FONT></TD><TD ALIGN=LEFT><FONT FACE="Verdana,Geneva,Helvetica" SIZE="-2"><A HREF="http://stackoverflow.com/dear-lego/?n=14">Re: Plate Paks</A><BR></FONT></TD><TD ALIGN=RIGHT><FONT FACE="Verdana,Geneva,Helvetica" SIZE="-2"> Tom Stangl<BR></FONT></TD></TR><TR BGCOLOR="#F8F8F8"><TD COLSPAN=4 ALIGN=LEFT VALIGN=TOP><FONT FACE="Verdana,Geneva,Helvetica" SIZE="-2" `\[/code\]
2: Message is in reply to (some html code) A HREF="http://stackoverflow.com/questions/12332335/link" (some html code=In task 1 is simply need to extract x
In task 2 i need to extract the links to which the message is in reply toI have looked into the different tm and XML packages but have not been able to actually find out what to use. Any advice is appreciated.This is what one of the txt files looks like`\[code\]</TABLE> <BR> <BR> <FONT FACE="Verdana,Geneva,Helvetica" SIZE="-1" COLOR="#990000"><B> Message has 2 Replies: </B></FONT><BR> <TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR VALIGN=TOP BGCOLOR="#E0E0E0"><TD ALIGN=LEFT><A HREF="http://stackoverflow.com/dear-lego/?n=14"><IMG BORDER=5 HEIGHT=3 WIDTH=3 SRC="http://stackoverflow.com/news/x.gif"></A></TD><TD><FONT SIZE="-2"> </FONT></TD><TD ALIGN=LEFT><FONT FACE="Verdana,Geneva,Helvetica" SIZE="-2"><A HREF="http://stackoverflow.com/dear-lego/?n=14">Re: Plate Paks</A><BR></FONT></TD><TD ALIGN=RIGHT><FONT FACE="Verdana,Geneva,Helvetica" SIZE="-2"> Tom Stangl<BR></FONT></TD></TR><TR BGCOLOR="#F8F8F8"><TD COLSPAN=4 ALIGN=LEFT VALIGN=TOP><FONT FACE="Verdana,Geneva,Helvetica" SIZE="-2" `\[/code\]