Объявления
New Member
We have the following code that lists the xpaths where \[code\]$value\[/code\] is found.We have detected for a given URL (see on picture) a non standard tag \[code\]td1\[/code\] which in addition doesn't have a closing tag. Probably the site developers have put that there intentionally, as you see in the screen shot below.This element creates problems identifying the corect XPath for nodes.A broken Xpath example : /html/body/div[2]/div[2]/table/tr[2]/td/table/tr1/td[2]/table/tr[2]/td[2]/table[3]/tr[2]/td1/td[2]/span/u1(as you see td1 is identified and chained in the Xpath)We think by removing this element it helps us to build the valid XPath we are after.A valid example is/html/body/div[2]/div[2]/table/tr[2]/td/table/tr1/td[2]/table/tr[2]/td[2]/table[3]/tr[2]/td[2]/span/u1How can we remove prior loading in DOMXpath? Do you have some other approach?We would like to remove all the invalid tags which may be other than td1, as h8, diw, etc...\[code\]private function extract($url, $value) { $dom = new DOMDocument(); $file = 'content.txt'; //$current = file_get_contents($url); $current = CurlTool::downloadFile($url, $file); //file_put_contents($file, $current); @$dom->loadHTMLFile($current); //use DOMXpath to navigate the html with the DOM $dom_xpath = new DOMXpath($dom); $elements = $dom_xpath->query("//*[text()[contains(., '" . $value . "')]]"); var_dump($elements); if (!is_null($elements)) { foreach ($elements as $element) { var_dump($element); echo "\n1.[" . $element->nodeName . "]\n"; $nodes = $element->childNodes; foreach ($nodes as $node) { if( ($node->nodeValue != null) && ($node->nodeValue =http://stackoverflow.com/questions/12803321/== $value) ) { echo'2.' . $node->nodeValue . "\n"; $xpath = preg_replace("/\/text\(\)/", "", $node->getNodePath()); echo '3.' . $xpath . "\n"; } } } } }\[/code\]