How can I have XPath ignore nested nodes?

mollybee · Jul 21, 2012

There's probably a better way to do this than what I'd doing, because I'm stuck in a metaphorical pothole. I want to get some of the nodes beneath a particular node. I came up with this XPath expression: \[code\]>>> content_tags = 'h1 h2 h3 h4 h5 h6 p ol ul dl table'.split() >>> content_xpath = './/*[%s]' % ' or '.join('self::%s' % i for i in content_tags) >>> content_xpath './/*[self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6 or self:

or self:

l or self::ul or self::dl or self::table]' \[/code\]Any of the listed content_tags can be the top of the hierarchy I'd wanting, and I want to ignore other elements that may be at the same or higher levels. Unfortunately, sometimes there's a \[code\]<p>\[/code\] inside a \[code\]<ul>\[/code\] or a \[code\]<table>\[/code\], or a \[code\]<table>\[/code\] inside a \[code\]<ol>\[/code\], etc., and I get the inner element as a separate result along with the outer. Is there a good way to perform a "cut" to ignore nodes that may be nested inside one that I've found? Or is there some better way of doing this that I'm somehow missing? Here's an example of what I'm trying to parse. \[code\]<div class="interesting"> <img src="http://stackoverflow.com/questions/10522521/ignore-this.jpg"/> <h1>I want this.</h1> <p>I want this, too.</p> <div class="sidebar"> <ul> <li><p>I only want one copy of this, inside the UL.</p></li> <li><p>Ditto.</p></li> </ul> </div> </div> \[/code\]Thanks! BTW, I found a few posts on a w3.org mailing list that advocated a "dont-include- any-descendant-or-self" filter, which I think would do exactly what I want, but it doesn't seem to have made it into the final spec.

How can I have XPath ignore nested nodes?

mollybee

New Member