I want to remove invalid characters from a wanabee XML file using PHP

GRZYBO

New Member
I want to remove invalid characters from an HTML file which is fetched from the web to be changed to an XML format. I can't change the source code, and doing it manually is not an option because I have to deal with hundreds of files per day.I had been doing well until some of the HTML files showed up with a special character that is invalidating the code.When I load the pretending XML file I get from the browser a warning \[code\]"This page contains the following errors:error on line 137 at column 1: PCDATA invalid Char value 7Below is a rendering of the page up to the first error."\[/code\]After digging for the invalid character with a text editor I found: ?, a character apparently named: &rang, &lang, or maybe &#9679, which is causing the problems.I have tried to remove it with PHP but doesn't work.\[code\]//create arrays $find = array ('# #','#list#','#⟩#'); $replace = array ('','',''); //replace with array values $list = preg_replace($find, $replace, $boletin_saveAsXml); \[/code\]Any advise will be greatly appreciated :)
 
Back
Top