How to get DOMDocument to be nice to ASCII control characters?

vadimskorohod · Jun 10, 2012

The HTML document which I am parsing contains some ASCII control codes. I noticed that PHP's DOMDocument parser truncates text nodes when it finds ASCII control characters within the node, such as \[quote\] \[code\]Device Control 0x13\[/code\] \[code\]End of Medium 0x19\[/code\] \[code\]File Separator 0x1C\[/code\] \[code\]Group Separator 0x1D\[/code\]\[/quote\]Is this a bug or a feature? Is there any way to have DOMDocument act otherwise? I resorted to remove this characters before DOM processing, but I wonder if that's the right solution.

How to get DOMDocument to be nice to ASCII control characters?

vadimskorohod

New Member