PHP DOM - Remove all elements EXCEPT…?

redmole

New Member
I am attempting to use PHP to edit the DOM document tree. However, I am stuck. After loading the HTML, I want to remove every element EXCEPT a select few that I specify. (\[code\]<p>\[/code\] and \[code\]<b>\[/code\], for example) How can I do this? Is it even possible?Below is my current code:\[code\]<?php$url = 'http://en.wikipedia.org/w/index.php?title=Elephant&action=render';$curl = curl_init();curl_setopt($curl, CURLOPT_URL, $url);curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');$html = '<html>' . curl_exec($curl) . '</html>';echo $html;$document = new DOMDocument;$document->loadHTML($html);$allowed_elements = array( 'a', 'b', 'i', 'p',);$parent = $document->getElementsByTagName('html')->item(0);foreach ($parent->getElementsByTagName('*') as $element){ $node = strtolower((string)$element->nodeName); if (!in_array($node, $allowed_elements)) { $element->parentNode->removeChild($element); }}echo $document->saveHTML();curl_close($curl);?>\[/code\]My tinkering has shown me that it is possible to loop through the DOM tree, so I assume I could just loop through it. However, my code still isn't working! I'm trying to get the plaintext Wikipedia article ultimately--if someone knows an alternate tool that I don't have to write myself, that'll be an acceptable answer.Thanks!! :)
 
Back
Top