How can I remove empty paragraphs from an HTML file using simple_html_dom.php?

jaketoronto2004

New Member
I want to remove empty paragraphs from an HTML document using simple_html_dom.php. I know how to do it using the DOMDocument class, but, because the HTML files I work with are prepared in MS Word, the DOMDocument's loadHTMLFile() function gives this exception "Namespaces are not defined".This is the code I use with the DOMDocument object for HTML files not prepared in MS Word:\[code\]<?php/* Using the DOMDocument class *//* Create a new DOMDocument object. */$html = new DOMDocument("1.0", "UTF-8");/* Load HTML code from an HTML file into the DOMDocument. */$html->loadHTMLFile("HTML File With Empty Paragraphs.html");/* Assign all the <p> elements into the $pars DOMNodeList object. */$pars = $html->getElementsByTagName("p");echo "The initial number of paragraphs is " . $pars->length . ".<br />";/* The trim() function is used to remove leading and trailing spaces as well as* newline characters. */for ($i = 0; $i < $pars->length; $i++){ if (trim($pars->item($i)->textContent) == ""){ $pars->item($i)->parentNode->removeChild($pars->item($i)); $i--; }}echo "The final number of paragraphs is " . $pars->length . ".<br />";// Write the HTML code back into an HTML file.$html->saveHTMLFile("HTML File WithOut Empty Paragraphs.html");?>\[/code\]This is the code I use with the simple_html_dom.php module for HTML files prepared in MS Word:\[code\]<?php/* Using simple_html_dom.php */include("simple_html_dom.php");$html = file_get_html("HTML File With Empty Paragraphs.html");$pars = $html->find("p");for ($i = 0; $i < count($pars); $i++) { if (trim($pars[$i]->plaintext) == "") { unset($pars[$i]); $i--; }}$html->save("HTML File without Empty Paragraphs.html");?>\[/code\]It is almost the same, except that that the $pars variable is a DOMNodeList when using DOMDocument and an array when using simple_html_dom.php. But this code does not work. First it runs for two minutes and then reports these errors: "Undefined offset: 1" and "Trying to get property of nonobject" for this line: "if (trim($pars[$i]->plaintext) == "") {".Does anyone know how I can fix this?Thank you.I also asked on php devnetwork.
 
Back
Top