Java REGEX XML parse/cut-down while maintaining structure HowTo

.:J@ss::

New Member
I am writing a RESTful web service in Java.The idea is to "cut down" an XML document and strip away all the unneeded content (~98%) and leave only the tags we're interested in, while maintaining the document's structure, which is as follows (I cannot provide the actual XML content for confidentiality reasons):\[code\]<sear:SEGMENTS xmlns="http://www.exlibrisgroup.com/xsd/primo/primo_nm_bib" xmlns:sear="http://www.exlibrisgroup.com/xsd/jaguar/search"> <sear:JAGROOT> <sear:RESULT> <sear:DOCSET IS_LOCAL="true" TOTAL_TIME="176" LASTHIT="9" FIRSTHIT="0" TOTALHITS="262" HIT_TIME="11"> <sear:DOC SEARCH_ENGINE_TYPE="Local Search Engine" SEARCH_ENGINE="Local Search Engine" NO="1" RANK="0.086826384" ID="2347460"> [ <PrimoNMBib> <record> <display> <title></title> </display> <sort> <author></author> </sort> </record> </PrimoNMBib> ] </sear:DOC> </sear:DOCSET> </sear:RESULT> </sear:JAGROOT></sear:SEGMENTS>\[/code\]Of course, this is the structure of only the tags we are interested in - there are hundreds more tags, but they are irrelevant.The square brackets (\[code\][]\[/code\]) are not part of the XML and indicate that the element \[code\]<PrimoNMBib></PrimoNMBib>\[/code\] are elements of a list of children and occur more than once - one per match of the search from the RESTFUL service.I've been trying to parse the document with regular expressions, as to leave only the segments of the structure as shown above along with the values of \[code\]<title>\[/code\] and \[code\]<author>\[/code\] while removing everything else in-between the tags including other tags, however I can't get it to work for the life of me... Previously I tried it using XSLT, however for unresolved reasons that didn't work either... I'd already asked a question for the XSLT implementation...Anyway, I would very much appreciate a tip/hint/solution as how to solve this problem using regex and Java...
 
Back
Top