Intelligent XML Traversal Using Java SAX

PlutoXs

New Member
Background:Recently, I have been tasked with parsing a large amount of data out of an HTML form and building it into a workable database table. The HTML page in question was generated a long time ago, and the original source data has been lost to the ages. Thus, I have decided to toss off a quick parser in Java to grab this data and format it appropriately. SAX is to be leveraged, as I do not need to modify the hierarchy in any way and a single pass is all that is needed. A very small sample of the HTML is included below:\[code\]<html> <table> <tr> <table> <tr> <td><div>District 1</div><td></tr> <tr> <td><div>Valid Code 1</div></td> <td><div>Valid Code 2</div></td> <td><div>Valid Code 3</div></td> </tr> </table> </tr>ETC...\[/code\]Obviously, there is more to the HTML than just what is outlined above but this should give an idea of the structure.Question:I am looking for an intelligent, extensible, self-documenting, and (if possible) fast / lean method of tracking my current location in the XML hierarchy using a SAX parser. Since , using SAX, I have three discrete method calls that are only triggered for a single element, this state must be persistent and storable. The obvious and easiest method of doing this would be a mountain of Boolean variables, but that is none of the prior four tenants that I have laid out. I have also considered bitmasking to maintain a large number of flags, but that is hardly self-documenting or very extensible. Finally, I have considered a Finite State Automata (or another similar derivative such as a Pushdown Automata) but those seem somewhat overkill for a one-off.Perhaps I am over-thinking the problem for a one-off bit of code, bit I am always looking to expand my skill set for the times I have to write code that is not one-off. Thank you in advance for your time and assistance.References:http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/Design pattern for a large nested switch statements(Not directly related to XML, but gives some ideas on designing with a large number of discrete conditions)
 
Back
Top