Multi-byte (Jap) character sets/MySQ

admin

Administrator
Staff member
Hi,

I'm working on a PHP/MySQL-based CMS and one of our clients needs to be able to publish Japanese content.

I've worked out how to run MySQL so that the default character set is SJIS, and was hoping this would solve the problem I'm having, but it hasn't:

It seems that if I store a long string of Shift_JIS text in a short field (e.g. I have a VARCHAR(30) field for a page name and someone enters a Japanese string for name thats more than 30 bytes long), MySQL will truncate it AT THE 30TH BYTE. This might be half-way through a multi-byte character.

This is what I think is happening anyway.

So then I print out a form (Content-Type: text/html; charset=Shift_JIS), and [simplified of course]

echo "<input type=text name=page_name value=http://www.phpbuilder.com/board/archive/index.php/\"$page_name\">";

Where I've extracted $page_name from the field in the database.

$page_name ends halfway through a character and the web-browser takes the quote-mark at the end of the value= <!-- m --><a class="postlink" href="http://www.phpbuilder.com/board/archive/index.php/attribute">http://www.phpbuilder.com/board/archive ... /attribute</a><!-- m --> to be the final byte in the character. Needless to say my form is pretty much ruined.

Does anyone know how to use Japanese characters MySQL fields and PHP without this sort of thing happening? Is there are way to tell MySQL not to truncate strings wildly like that? Or are there PHP functions for cleaning the strings up myself? The multi-byte string functions in the PHP library seem to support all the character sets that MySQL doesn't support! And vice versa.

Anyway, any help would be greatly appreciated.


Steve
 
Back
Top