Substr Function For Asian Characters?

liunx

Guest
Hi<br /><br />I am trying to port my web page to chinese. One of the problems is that I use a substr function to give page summaries, and this does not work well for chinese characters. A substr("chinesecharacters", 20) returns something like 3 chinese characters and a broken one at the end. <br /><br />Is there a way for me to use an alternative substr-type function to return, for example, 5 chinese characters? <br /><br />There was one last thing I wanted to say.. oh yeah:<br /><br /> Rock Sign <br /><br />Cheers,<br />Roy<!--content-->
Hi Roy.<br /><br />What programming language are you using?<br /><br />I tried searching the web about your problem using PHP but I couldn't find anything relevant... <img src="http://www.totalchoicehosting.com/forums/style_emoticons/default/sad.gif" style="vertical-align:middle" emoid=":(" border="0" alt="sad.gif" /><!--content-->
Hi,<br /><br />I haven't used PHP in a while, but I thought substr() does not support Multi-byte characters. Try using mb_substr() instead.<br /><br />For mb_substr() to work, PHP has to be compiled with the "--enable-mbstring" option. I have no idea if that's the case for your server. <br /><br />My best,<br />Tim<!--content-->
Hey guys,<br /><br />I forgot to mention, I'm using PHP now as you guessed.<br /><br />I tried the mb_substr, but it does not look like it is compiled on my server. Would you know how I could get it installed?<br /><br />Thanks for the help. I have tried researching also, but the answer still eludes me. I really wonder how programmers in E.Asia handle this problem- im certain there is some everyday function that they can use.<br /><br />Roy<!--content-->
Hello again,<br /><br />Have you read the PHP manual about substr() and the comments posted by other users?<br /><br /><a href="http://www.php.net/manual/en/function.substr.php" target="_blank">http://www.php.net/manual/en/function.substr.php</a><br /><br />A user there named 'ken at wisers dot com' suggests the following replacement function:<br /><br /><!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->function dbyte_substr($str, $start, $len=''){<br />        if($len == ''){<br />                $outstr = substr($str, $start);<br />        }else{<br />                $outstr = substr($str, $start, $len);<br />                // Check the end bound is an double byte first byte or not<br />                if(preg_match("/[\x80-\xFF]$/", $outstr)){<br />                        $outstr = substr("$outstr", 0, -1);<br />                }<br />        }<br />        return $outstr;<br />}<!--c2--></div><!--ec2--><br /><br />I have never tried or tested this, so don't blame me if the server blows up or something like that! <img src="http://www.totalchoicehosting.com/forums/style_emoticons/default/cool.gif" style="vertical-align:middle" emoid="B)" border="0" alt="cool.gif" /> <br /><br />BTW, what encoding are you using for Chinese?<br /><br />My best,<br />Tim<!--content-->
Thanks so much for your help Tim!<br /><br />The encoding is Chinese Simplified GB2312<br /><br />The function does not seem to be working unfortunately.<br />I will keep looking also and post if I find anything.<br /><br /> <img src="http://www.totalchoicehosting.com/forums/style_emoticons/default/mad.gif" style="vertical-align:middle" emoid=":angry:" border="0" alt="mad.gif" /> <br /><br />Roy<!--content-->
<!--QuoteBegin-Timbo+Oct 18 2003, 04:41 PM--><div class='quotetop'>QUOTE(Timbo @ Oct 18 2003, 04:41 PM)</div><div class='quotemain'><!--QuoteEBegin-->Hi,<br /><br />I haven't used PHP in a while, but I thought substr() does not support Multi-byte characters. Try using mb_substr() instead.<br /><br />For mb_substr() to work, PHP has to be compiled with the "--enable-mbstring" option. I have no idea if that's the case for your server. <br /><br />My best,<br />Tim<!--QuoteEnd--></div><!--QuoteEEnd--><br /> This is one of the major weaknesses of PHP: various internationalization and localization features are considered optional, and US-based servers tend not to have them compiled.<br /><br />BUT if you can afford to port your code to Perl (especially 5.8.x, which TCH has -- yeah!), the outlook is good. <br /><br />Rock Sign<!--content-->
 
Back
Top