i have read several of the questions that have dealt with undeclared entities.My problem is a little different.I'm following this procedure,to scrape various pages from the net.1.first run the php tidy function on the file.2.then create a dom document out of it, and use xpath to get the values of certain nodes.(table, para, and blockquotes only)my problem is simple:1.Warning: DOMDocument::loadHTML(): ID hp.global.servicebox.links.arztsuche already defined in Entity, line: 21122.XML error: Undeclared entity warning at line 2679I realize that the first warning , was probably due to the fact that i was passing it through the tidy function first, and then loadHTML next.But the second problem is really troublesome. It simply refuses to produce any output, and I loose everything.Reading up on this website, has revealed that an undeclared entity, ought to be declared before hand, but you will realize, that this is not possible given the nature of my task( I'm scraping the web for god's sakes).I have enabled the\[code\]var_dump(libxml_use_internal_errors(true));\[/code\]but, beyond the fact that it doesnt clutter my terminal, it doesnt help at all. For starters, there's no documentation on how you should handle this error, or any error for that matter.I realize that this can't be the first time someone has encountered this problem, and I'm sure that the solution is out there, I just cant seem to find it. Several thousands of people warn you against using regex to parse html or xml, but very few have solutions to problems that we face with parsers --- just like the one I'm facing.Cheers,Richard,- a disgruntled HTML parser afficonado.EDIT:some additional information-->this is the tidy function I'm using.\[code\]function cleaning($what_to_clean, $tidy_config='' ) \[/code\]{\[code\]$config = array( 'show-body-only' => false, 'clean' => true, 'char-encoding' => 'utf8', 'add-xml-decl' => true, 'add-xml-space' => true, 'output-html' => false, 'output-xml' => false, 'output-xhtml' => true, 'numeric-entities' => false, 'ascii-chars' => false, 'doctype' => 'strict', 'bare' => true, 'fix-uri' => true, 'indent' => true, 'indent-spaces' => 4, 'tab-size' => 4, 'wrap-attributes' => true, 'wrap' => 0, 'indent-attributes' => true, 'join-classes' => false, 'join-styles' => false, 'enclose-block-text' => true, 'fix-bad-comments' => true, 'fix-backslash' => true, 'replace-color' => false, 'wrap-asp' => false, 'wrap-jste' => false, 'wrap-php' => false, 'write-back' => true, 'drop-proprietary-attributes' => false, 'hide-comments' => false, 'hide-endtags' => false, 'literal-attributes' => false, 'drop-empty-paras' => false,///dont drop empty paras 'enclose-text' => true, 'quote-ampersand' => true, 'quote-marks' => false, 'quote-nbsp' => true, 'vertical-space' => true, 'wrap-script-literals' => false, 'tidy-mark' => false, 'merge-divs' => false, 'repeated-attributes' => 'keep-last', 'break-before-br' => false,///dont add line breaks before breakes);if( $tidy_config == '' ) { $tidy_config = &$config;}$tidy = new tidy();$out = $tidy->repairString($what_to_clean, $tidy_config, 'UTF8');unset($tidy);unset($tidy_config);return($out);\[/code\]}