Php – Best practices in PHP and MySQL with international strings


It often happens that characters such as é gets transformed to é, even though the collation for the MySQL DB, table and field is set to utf8_general_ci. The encoding in the Content-Type for the page is also set to UTF8.

I know about utf8_encode/decode, but I'm not quite sure about where and how to use it.

I have read the "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" article, but I need some MySQL / PHP specific pointers.

Question: How do I ensure that user entered data containing international characters doesn't get corrupted?

Best Solution

On the first look at I think that one important thing is missing (perhaps I overlooked this one). Depending on your MySQL installation and/or configuration you have to set the connection encoding so that MySQL knows what encoding you're expecting on the client side (meaning the client side of the MySQL connection, which should be you PHP script). You can do this by manually issuing a


query prior to any other query you send to the MySQL server.

If your're using PDO on the PHP side you can set-up the connection to automatically issue this query on every (re)connect by using

$db=new PDO($dsn, $user, $pass);
$db->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES utf8");

when initializing your db connection.