Find all descendant text() nodes except in subsections

Kaopuic · Jul 20, 2012

My XML document has arbitrarily nested sections. Given a reference to a particular section I need to find all the \[code\]TextNode\[/code\]s in that section not including subsections.For example, given a reference to the \[code\]#a1\[/code\] node below, I need to find only the "A1 " and "A1" text nodes:\[code\]<root> <section id="a1"> <b>A1 <c>A1</c></b> <b>A1 <c>A1</c></b> <section id="a1.1"> <b>A1.1 <c>A1.1</c></b> </section> <section id="a1.2"> <b>A1.2 <c>A1.2</c></b> <section id="a1.2.1"> <b>A1.2.1</b> </section> <b>A1.2 <c>A1.2</c></b> </section> </section> <section id="a2"> <b>A2 <c>A2</c></b> </section></root>\[/code\]In case it wasn't obvious, the above is made-up data. The \[code\]id\[/code\] attributes in particular may not exist in the real-world document.The best I've come up with for now is to find all text nodes within the section and then use Ruby to subtract out the ones I don't want:\[code\]def own_text(node) node.xpath('.//text()') - node.xpath('.//section//text()')enddoc = Nokogiri.XML(mydoc,&:noblanks)p own_text(doc.at("#a1")).length #=> 4\[/code\]Can I craft a single XPath 1.0 expression to find these nodes directly? Something like:\[code\].//text()[ancestor::section = self] # self being the original context node\[/code\]

Find all descendant text() nodes except in subsections

Kaopuic

New Member