PHP DOM UTF-8 problem

apophis

New Member
First of all, my database uses Windows-1250 as native charset. I am outputting the data as UTF-8. I'm using iconv() function all over my website to convert Windows-1250 strings to UTF-8 strings and it works perfect.The problem is when I'm using PHP DOM to parse some HTML stored in the database (the HTML is an output from a WYSIWYG editor and is not valid, it has no html, head, body tags etc).The HTML could look something like this, for example:\[code\]<p>Hello</p>\[/code\]Here is a method I use to parse a certain HTML from the database:\[code\] private function ParseSlideContent($slideContent) { var_dump(iconv('Windows-1250', 'UTF-8', $slideContent)); // this outputs the HTML ok with all special characters $doc = new DOMDocument('1.0', 'UTF-8'); // hack to preserve UTF-8 characters $html = iconv('Windows-1250', 'UTF-8', $slideContent); $doc->loadHTML('<?xml encoding="UTF-8">' . $html); $doc->preserveWhiteSpace = false; foreach($doc->getElementsByTagName('img') as $t) { $path = trim($t->getAttribute('src')); $t->setAttribute('src', '/clientarea/utils/locate-image?path=' . urlencode($path)); } foreach ($doc->getElementsByTagName('object') as $o) { foreach ($o->getElementsByTagName('param') as $p) { $path = trim($p->getAttribute('value')); $p->setAttribute('value', '/clientarea/utils/locate-flash?path=' . urlencode($path)); } } foreach ($doc->getElementsByTagName('embed') as $e) { if (true === $e->hasAttribute('pluginspage')) { $path = trim($e->getAttribute('src')); $e->setAttribute('src', '/clientarea/utils/locate-flash?path=' . urlencode($path)); } else { $path = end(explode('data/media/video/', trim($e->getAttribute('src')))); $path = 'data/media/video/' . $path; $path = '/clientarea/utils/locate-video?path=' . urlencode($path); $width = $e->getAttribute('width') . 'px'; $height = $e->getAttribute('height') . 'px'; $a = $doc->createElement('a', ''); $a->setAttribute('href', $path); $a->setAttribute('style', "display:block;width:$width;height:$height;"); $a->setAttribute('class', 'player'); $e->parentNode->replaceChild($a, $e); $this->slideContainsVideo = true; } } $html = trim($doc->saveHTML()); $html = explode('<body>', $html); $html = explode('</body>', $html[1]); return $html[0]; }\[/code\]The output from the method above is a garbage with all special characters replaced with weird stuff like ????.One more thing. It does work on my development server.It does not work on the production server though.Any suggestions?PHP version of the production server: PHP Version 5.2.0RC4-devPHP version of the development server: PHP Version 5.2.13UPDATE:I'm working on a solution myself. I have an inspiration from this PHP bug report (not really a bug though): http://bugs.php.net/bug.php?id=32547This is my proposed solution. I will try it tomorrow and let you know if it works:\[code\] private function ParseSlideContent($slideContent) { var_dump(iconv('Windows-1250', 'UTF-8', $slideContent)); // this outputs the HTML ok with all special characters $doc = new DOMDocument('1.0', 'UTF-8'); // hack to preserve UTF-8 characters $html = iconv('Windows-1250', 'UTF-8', $slideContent); $doc->loadHTML('<?xml encoding="UTF-8">' . $html); $doc->preserveWhiteSpace = false; // this might work // it basically just adds head and meta tags to the document $html = $doc->getElementsByTagName('html')->item(0); $head = $doc->createElement('head', ''); $meta = $doc->createElement('meta', ''); $meta->setAttribute('http-equiv', 'Content-Type'); $meta->setAttribute('content', 'text/html; charset=utf-8'); $head->appendChild($meta); $body = $doc->getElementsByTagName('body')->item(0); $html->removeChild($body); $html->appendChild($head); $html->appendChild($body); foreach($doc->getElementsByTagName('img') as $t) { $path = trim($t->getAttribute('src')); $t->setAttribute('src', '/clientarea/utils/locate-image?path=' . urlencode($path)); } foreach ($doc->getElementsByTagName('object') as $o) { foreach ($o->getElementsByTagName('param') as $p) { $path = trim($p->getAttribute('value')); $p->setAttribute('value', '/clientarea/utils/locate-flash?path=' . urlencode($path)); } } foreach ($doc->getElementsByTagName('embed') as $e) { if (true === $e->hasAttribute('pluginspage')) { $path = trim($e->getAttribute('src')); $e->setAttribute('src', '/clientarea/utils/locate-flash?path=' . urlencode($path)); } else { $path = end(explode('data/media/video/', trim($e->getAttribute('src')))); $path = 'data/media/video/' . $path; $path = '/clientarea/utils/locate-video?path=' . urlencode($path); $width = $e->getAttribute('width') . 'px'; $height = $e->getAttribute('height') . 'px'; $a = $doc->createElement('a', ''); $a->setAttribute('href', $path); $a->setAttribute('style', "display:block;width:$width;height:$height;"); $a->setAttribute('class', 'player'); $e->parentNode->replaceChild($a, $e); $this->slideContainsVideo = true; } } $html = trim($doc->saveHTML()); $html = explode('<body>', $html); $html = explode('</body>', $html[1]); return $html[0]; }\[/code\]
 
Back
Top