Java REGEX XML parse/cut-down while maintaining structure HowTo

.:J@ss:: · Jul 21, 2012

I am writing a RESTful web service in Java.The idea is to "cut down" an XML document and strip away all the unneeded content (~98%) and leave only the tags we're interested in, while maintaining the document's structure, which is as follows (I cannot provide the actual XML content for confidentiality reasons):\[code\]<sear:SEGMENTS xmlns="http://www.exlibrisgroup.com/xsd/primo/primo_nm_bib" xmlns:sear="http://www.exlibrisgroup.com/xsd/jaguar/search"> <sear:JAGROOT> <sear:RESULT> <sear

OCSET IS_LOCAL="true" TOTAL_TIME="176" LASTHIT="9" FIRSTHIT="0" TOTALHITS="262" HIT_TIME="11"> <sear

OC SEARCH_ENGINE_TYPE="Local Search Engine" SEARCH_ENGINE="Local Search Engine" NO="1" RANK="0.086826384" ID="2347460"> [ <PrimoNMBib> <record> <display> <title></title> </display> <sort> <author></author> </sort> </record> </PrimoNMBib> ] </sear

OC> </sear

OCSET> </sear:RESULT> </sear:JAGROOT></sear:SEGMENTS>\[/code\]Of course, this is the structure of only the tags we are interested in - there are hundreds more tags, but they are irrelevant.The square brackets (\[code\][]\[/code\]) are not part of the XML and indicate that the element \[code\]<PrimoNMBib></PrimoNMBib>\[/code\] are elements of a list of children and occur more than once - one per match of the search from the RESTFUL service.I've been trying to parse the document with regular expressions, as to leave only the segments of the structure as shown above along with the values of \[code\]<title>\[/code\] and \[code\]<author>\[/code\] while removing everything else in-between the tags including other tags, however I can't get it to work for the life of me... Previously I tried it using XSLT, however for unresolved reasons that didn't work either... I'd already asked a question for the XSLT implementation...Anyway, I would very much appreciate a tip/hint/solution as how to solve this problem using regex and Java...

Java REGEX XML parse/cut-down while maintaining structure HowTo

.:J@ss::

New Member