DOM and xpath giving me troubles....

admin

Administrator
Staff member
I'm trying to get a short script working. Bascially, take an RSS 2.0 feed (specifically, one from a Trac bug tracking system) and use DOM to make an object of it, then use xpath to extract the one child that has the ticket ID of a specific value in it.

Here's what I've come up with, and I'm no genius at xpath, so the query is dead wrong.

$xml = new DOMDocument('1.0', 'utf8');
if(!@$xml->loadHTML($temp))
exit('Unable to load XML properly.');

$query = '///item[title::contains(self, "'.$ticketID.'")]';
$xpath = new DOMXpath($xml);
$items = $xpath->query($query);

foreach($items as $item)
{
$entries[] = array(
'link' => $item->previousSibling->previousSibling->nodeValue,
'guid' => $item->previousSibling->nodeValue,
'title' => $item->nodeValue,
'description' => $item->followingSibling->nodeValue,
'category' => $item->followingSibling->followingSibling->nodeValue,
'comments' => $item->followingSibling->followingSibling->followingSibling->nodeValue
);
}

Now, I can get the file contents fine, and create the document. One thing you may notice is that I'm using loadHTML instead of loadXML. If I use loadXML, I get some invalid characters in the XML. loadHTML won't break that.

Thanks for any help you can offer me. You can use the demo tickets XML RSS Feed (<!-- m --><a class="postlink" href="http://trac.edgewall.org/report/1?format=rss&USER=anonymous">http://trac.edgewall.org/report/1?forma ... =anonymous</a><!-- m -->) to test it out on.

I know I need to look inside of the <title> child of every <item> until I find one that matches the basic format of: #TicketID: Ticket Title / Summary

Each <title> node will follow that format. I figure xpath is the way to go here. Any help would be appreciated.I notice that the ticket ID is also used in the GUID - I don't know if that would be more reliable (I'm considering the possibility that something that looks like a ticket ID might pop up in the title for some other reason).

But the XPath query: item elements (anywhere in the document) that have a title element with a text node that contains the supplied string. Since it's the item elements you want, the rest would be part of a predicate.
//item[contains(title/text(), '$ticketID')]
I think that's about it. You'll want to adjust your DOM traversal so that it looks at the children of each item (getElementsByTagName) because I notice some variation about which elements have which other elements as siblings (some have author elements and some don't, for example).You'll want to adjust your DOM traversal so that it looks at the children of each item (getElementsByTagName) because I notice some variation about which elements have which other elements as siblings (some have author elements and some don't, for example).

Trying to get what you're saying here. I can keep the foreach loop (because otherwise I'd use a for() loop that goes until 1-$items->length) and just use $item->nodeValue to echo out the information. I get that.

What i don't get is how I can get the element's tag name. I'd like to easily create a small array that has the tag-name as the index, and the nodeValue as its value.

Nevermind, I apparently had a bad ticketID ;)

But still, I'm having minor issues. Once I get the node-list back . . .
1.) How would I got about traversing it?
2.) How can I get the tag Names so I can either get the info I want, or create an indexed array?

Still haven't figured those out. Obviously, the array would be easiest for me; however, since this information (once i get it) is going into a database, just getting the proper information will work as well. At this point, I'm lost....Once you've got the item node, you could get the DOMNodeList of its childNodes. Then iterate over that, looking at localName and nodeValue properties.
Summat like:

$query = "//item[contains(title/text(), '$ticketID')]";
$xpath = new DOMXpath($xml);
$items = $xpath->query($query);

foreach($items as $item)
{
$children = $item->childNodes;
foreach($children as $child)
{
echo $child->localName,"\t=\t",$child->nodeValue,"\n";
}
}
(Incidentally, I tried this with ticket 4163 from that example, which happens to cite ticket 69 in the title; so searching for ticket 69 would have turned up both).That's it. Thanks weedpacket. Always helpful.

I also changed it from title to guid since (per the spec at least) guid has to be unique. Thanks for the help.
 
Back
Top