Solving UTF8 & french accents incompatibility

I have a PHP script which saves user content into a mysql database (PHP 5.4, mysql 5.5.31)

All string-related fields in my database have utf8_unicode_ci as collation.

My (simplified) code looks like this:

$db_handle = mysql_connect('localhost', 'username', 'password');
mysql_select_db('my_db');

mysql_set_charset('utf8', $db_handle);

// ------ INSERT: First example -------
$s   = "je viens de télécharger et installer le logiciel";
$sql = "INSERT INTO my_table (post_id, post_subject, post_text) VALUES (1, 'subject 1', '$s')";
mysql_query($sql, $db_handle);

// ------ INSERT: Second example -------
$s   = "EPrints and العربية";
$sql = "INSERT INTO my_table (post_id, post_subject, post_text) VALUES (2, 'subject 2', '$s')";
mysql_query($sql, $db_handle);
// ------------- 

mysql_close($db_handle);

The problem is, the first insert (latin text with the é accents) fails unless I comment this line:

mysql_set_charset('utf8', $db_handle);

But the second query (mix of latin & arabic content) will fail unless I call mysql_set_charset('utf8', $db_handle);

I've been struggling with this for 2 days now. I thought UTF8 does support characters like the french accents, but obviously it doesn't!

How can I fix this?


mysql_set_charset('utf8', $db_handle) tells the database that the data you're going to send it will be encoded in UTF-8. If the result is messed up, that means you did not in fact send UTF-8 encoded text. Double check the encoding of what you're sending.

I thought UTF8 does support characters like the french accents, but obviously it doesn't!

I does just fine.


See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text and Handling Unicode Front To Back In A Web App.


Is the PHP text in UTF-8? This concerns the encoding of the editor. When yes, then the bytes in the string literal should already be okay. It seems to be the case as Arabic is written too.

Use prepared statements for the SQL. This has several advantages: security (SQL injection), escaping of quotes and other special characters, and ... maybe ... encoding of the SQL string.

Unlikely: try

$s   = utf8_encode("je viens de télécharger et installer le logiciel");

Though I can foresee another problem: the definition of utf8_encode expects an ISO-8859-1 string, feasible for French, but not for Arabic. If this works, the encoding of the PHP is wrong somehow.

(I find Java to be more consistent wrt Unicode, so I am not entirely sure for PHP.)


The issue of knowing the encoding and converting if necessary, can be addressed using something like this, which makes sure that coding is CP1252. Reverse this to make sure it is UTF8.

function conv_text($value) {
    $result = mb_detect_encoding($value." ","UTF-8,CP1252") == "UTF-8" ? iconv("UTF-8", "CP1252", $value ) : $value;
    return $result;
}
链接地址: http://www.djcxy.com/p/67820.html

上一篇: 友好的Url问题字符串,带有ñ或重音

下一篇: 解决UTF8与法语口音不兼容的问题