UTF-8 encoding problem with XSLT via PHP

Chansen

New Member
I'm facing a nasty encoding issue when transforming XML via XSLT through PHP.The problem can be summarised/dumbed down as follows: when I copy a (UTF-8 encoded) XHTML file with an XSLT stylesheet, some characters are displayed wrong. When I just show the same XHTML file, all characters come out correctly.Following files illustrate the problem:XHTML\[code\]<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE htmlPUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title>encoding test</title> </head> <body> <p>This is how we dïßπλǽ ‘special characters’</p> </body></html>\[/code\]XSLT\[code\]<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" encoding="UTF-8"/> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template></xsl:stylesheet>\[/code\]PHP\[code\]<?php $xml_file = 'encoding_test.xml'; $xsl_file = 'encoding_test.xsl'; $xml_doc = new DOMDocument('1.0', 'utf-8'); $xml_doc->load($xml_file); $xsl_doc = new DOMDocument('1.0', 'utf-8'); $xsl_doc->load($xsl_file); $xp = new XsltProcessor(); $xp->importStylesheet($xsl_doc); // alllow to bypass XSLT transformation with bypass=true request parameter if ($bypass = $_GET['bypass']) { echo file_get_contents($xml_file); } else { echo $xp->transformToXML($xml_doc); }?>\[/code\]When this script is invoked as such (via e.g. http://localhost/encoding_test/encoding_test.php), all characters in the transformed XHTML document come out ok, except for the ‘ and ’ character entities (they're opening and closing single quotation marks). I'm not a Unicode expert, but two things strike me:[*]all other character entities are interpreted correctly (which could imply something about the UTF-8-ness of \[code\]‘\[/code\] and \[code\]’\[/code\])[*]yet, when the XHTML file is displayed unmediated (via e.g. http://localhost/encoding_test/encoding_test.php?bypass=true), all characters are displayed properly.I think I've declared UTF-8 encoding for the output anywhere I could. Do others perhaps see what's wrong and can be righted?Thanks in advance!Ron Van den Branden
 
Back
Top