How to extract HTML Code from a XML File using groovy

greenleaf · Jul 21, 2012

I have this XML file I need to extract the HTML Code from "mono" element but I need the html tags. I need to use groovy programming language.All the divs inside "mono" element are HTML Tags including the divsthank you in advance.\[code\]<dataset> <chapters> <chapter id="700" name="Immunology"> <title>Immunology</title> <monos> <mono id="382727"> <div> <h1>blah blah</h1> </div> <div> <p>blah blah</p> </div> </mono> </monos> </chapter> <chapter id="701" name="hematology"> <title>Inmuno Hematology</title> <monos> <mono id="blah blah"> <div> <h1>blah blah</h1> </div> <div> <div class="class1">blah blah</div> </div> </mono> </monos> </chapter> </chapters></dataset>\[/code\]I have tried :\[code\]import javax.xml.parsers.*;xml = new XmlParser().parse("languages.xml")println("There are " +xml.chapters.chapter.size() +" Chapters")for (int i = 0; i < xml.chapters.chapter.size(); i++) { def chapter = xml.chapters.chapter def chapterName = chapter.'@name' println chapterName println("---- Monos List ----\n\n") for (int j = 0; j < chapter.monos.mono.size(); j++) { def mono = chapter.monos.mono[j] println("Mono Content: " + mono.toString()); } println("---- End Monos List ----\n\n")}\[/code\]But I just get the following ouput:There are 2 ChaptersImmunology---- Monos List ----Mono Content: mono[attributes={id=382727}; value=http://stackoverflow.com/questions/10387407/[div[attributes={}; value=[h1[attributes={}; value=[blah blah]]]], div[attributes={}; value=[p[attributes={}; value=[blah blah]]]]]]---- End Monos List ----hematology---- Monos List ----Mono Content: mono[attributes={id=blah blah}; value=http://stackoverflow.com/questions/10387407/[div[attributes={}; value=[h1[attributes={}; value=[blah blah]]]], div[attributes={}; value=[div[attributes={class=class1}; value=[blah blah]]]]]]---- End Monos List ----

How to extract HTML Code from a XML File using groovy

greenleaf

New Member