HTML Cleanup and parse

orbZ · Jul 20, 2012

I am trying to scrape some data from an HTML document, but i'm running into some difficulties. [*]I cannot parse with NSXMLParser because it's malformed.[*]I cannot properly use XPath because the code isn't very clean (lots of duplicated class names)Changing the HTML is not an option, I have to deal with what I am given.My thoughts are this point were to try to clean up the HTML so that I can use NSXMLParser, but as I am not that great at C style libraries, I cannot figure out libtidy. Then I found TouchXML, its supposed to automatically tidy my HTML, but I keep getting parse errors.So, I guess what I would like is for someone to point me to something that would help me parse this without coding a parser from scratch.Please Help!

HTML Cleanup and parse

orbZ

New Member