Trying to pull content with tags from XML with PHP

darvenginzks · Jan 22, 2013

We use Acalog at our institution and want to use their (unsupported) API to pull catalog content into our site from theirs. I can access their files and pull out the information, but the formatting (paragraph, bold, italics, breaks) is done as nodes (h

, h:b, h:i, h:br). Unfortunately, the text I've pulled from searching for a:content only brings straight text and does not include the formatting nodes. How can I bring the nodes with the text? Where am I going wrong?The start of the XML (I broke it off at about the half mark)\[code\]<catalog xmlns="http://acalog.com/catalog/1.0" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:a="http://www.w3.org/2005/Atom" xmlns:xi="http://www.w3.org/2001/XInclude" id="acalog-catalog-6"><hierarchy> <legend> <key id="acalog-entity-type-5"> <name>Department</name> <localname>Department</localname> </key> </legend> <entity id="acalog-entity-239"> <type xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" xi:xpointer="xmlns(c=http://acalog.com/catalog/1.0) xpointer((//c:key[@id='acalog-entity-type-5'])[1])"/> </type> <a:title xmlns:a="http://www.w3.org/2005/Atom">American Studies</a:title> <code/> <a:content xmlns:a="http://www.w3.org/2005/Atom" xmlns:h="http://www.w3.org/1999/xhtml"> <h

xmlns:h="http://www.w3.org/1999/xhtml"> <h:span class="dept_intro"> <h:i>Chair of the Department of American Studies: </h:i> </h:span> <h:span class="dept_intro">John Smith</h:span> <h:br/> <h:span class="dept_intro"> <h:br/>
Professors: Jane Smith; Sarah Smith, <h:i class="dept_intro">The Douglas Family Chair in American Culture, History, and Literary and Interdisciplinary Studies</h:i> <h:br/><h:br/>
Associate Professor: Michael Smith </h:span> <h:span class="dept_intro"><h:br/></h:span> </h

> <h

xmlns:h="http://www.w3.org/1999/xhtml"> <h:span class="dept_intro">Assistant Professor: Rebecca Smith</h:span> </h

> <h

xmlns:h="http://www.w3.org/1999/xhtml"> <h:span class="dept_intro">Lecturer: * Leonard Smith</h:span></h

> <h

xmlns:h="http://www.w3.org/1999/xhtml"> <h:span class="dept_intro">Visiting Lecturer: * Robert Smith<h:br/><h:br/><h:br/><h:br/></h:span><h:strong>Department Overview</h:strong></h

> <h

xmlns:h="http://www.w3.org/1999/xhtml" class="MsoNormal">American studies is an interdiscipl\[/code\]Here's the code I've written thus far:\[code\]$xml = file_get_contents($url); if ($xml === false) { return false; } else { // Create an empty DOMDocument object to hold our service response $dom = new DOMDocument('1.0', 'UTF-8'); // Load the XML $dom->loadXML($xml); // Create an XPath Object $xpath = new DOMXPath($dom); // Register the Catalog namespace $xpath->registerNamespace('h', 'http://www.w3.org/1999/xhtml'); $xpath->registerNamespace('a', 'http://www.w3.org/2005/Atom'); $xpath->registerNamespace('xi', 'http://www.w3.org/2001/XInclude'); // Check for error $status_elements = $xpath->query('//c:status[text() != "success"]'); if ($status_elements->length > 0) { // An error occurred return false; } $x = $dom->documentElement; foreach ($x->childNodes AS $item) { //echo $item->nodeName . " = " . $item->nodeValue . " "; } // Retrieve all catalogs elements $pageText = $xpath->query('//a:content'); if ($pageText->length == 0) { // No text found return false; } foreach ($pageText AS $item) { $txt = (string) $item->nodeValue; $txt = str_replace('<h:i>','',$txt); $txt = str_replace('</h:i>','',$txt); $txt = str_replace('<h:span class="dept_intro">','',$txt); $txt = str_replace('</h:span>','',$txt); if(strpos($txt,'Department Overview')) { echo '' . str_replace('Department Overview','',$txt) . ''; break; } else { echo '' . $txt . ''; } //echo $pageText->nodeValue; } }\[/code\]The line $pageText = $xpath->query('//a:content'); pulls the content, but not the tags.

Trying to pull content with tags from XML with PHP

darvenginzks

Guest