Pdo + MySQL and Broken Utf-8 Encoding

PDO + MySQL and broken UTF-8 encoding

Warning: This answer applies to PHP 5.3.5 and lower. Do not use it for PHP version 5.3.6 (released in March 2011) or later.

Compare with Palec's answer here.


Use:

$pdo = new PDO( 
'mysql:host=hostname;dbname=defaultDbName',
'username',
'password',
array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")
);

It forces UTF-8 on the PDO connection. It worked for me.

PHP, PDO, UTF8 and MySQL not playing ball

For the word Générik, your "logged hex bytes" are 0x47c3a96ec3a972696b. This is indeed UTF-8 encoded. The client with which you are attempting to verify your stored data is almost certainly setting the wrong character set prior to fetching the table contents.

Php + Mysql (UTF-8 ) some characters are still bug

Checklist for Problems with character/charset/collation

Including mysql, mysqli, PDO



Content

  1. DISCLAIMER
  2. My insert's in my DB doesn't work properly! What can i do?
  3. Change Charset and Collation of a Database or Table
  4. Set the encoding of your skript files
  5. Set the charset of your page with php or meta tag
  6. What's the difference between UTF8 and UTF8mb4?
  7. Answer to this specific Question
  8. Further Information/Additional Links
  9. Side Notes


1. DISCLAIMER

This Answer should not only answer this question, also should the answer be a bit more extensive, so more people find faster a bundled and good answer!

!Important Notice!

If you change something in your Database always make sur you have a backup of your database! Check it 2 times, or 3!

I'm open for improvements and comments, such as error corrections.

In addition I apologize if the grammar is not perfect: D


If you get stuck on a question like this:

  • Php + Mysql (UTF-8, utf8mb4) some characters are still bug
  • How to convert an entire MySQL database characterset and collation to UTF-8?
  • “Incorrect string value” when trying to insert UTF-8 into MySQL
  • Change MySQL default character set to UTF-8 in my.cnf?
  • Using utf8mb4 with php and mysql
  • PDO + MySQL and broken UTF-8 encoding
  • Error in insertion data in php Mysql
  • PHP PDO: charset, set names?
  • SET NAMES utf8 in MySQL?
  • PHP mysql charset utf8 problems
  • UTF-8 all the way through
  • Manipulating utf8mb4 data from MySQL with PHP
  • ERROR 1115 (42000) : Unknown character set: 'utf8mb4' in mysql

...then my answer maybe helps you!


2. My insert's in my DB doesn't work properly! What can i do?

If your insert's doesn't work properly an your inserted data looks something like this in your database then this could have various reasons!

Examples:

??????????
br>�??_ �?�
â_ ⬠⥠J

Here is a little checklist you can go trought and check if everything is how it should be!

(After the checklist there a few extra informations for mysql, mysqli and PDO)


Checklist:

  • Make sure default character sets is set on tables, client, server & text fields

    • If NOT See Point 3
  • Make sure your database connections character sets

    • IF NOT See Point mysql/PDO
  • Make sure if your displaying data that the charset of the document is set!

    • IF NOT See Point 5
  • Make sure your skript files are saved with the right charset!

    • IF NOT See Point 4
  • Make sure you set your character and your charset!

    • IF NOT See Point mysql/PDO
  • Make sure you forms accept utf8!

    • IF NOT See Point 5
  • Make sure you have set the connection encoding

    • IF NOT See Point mysql/pdo
  • Make sure you have set the servercharacter encoding right

    • IF NOT See Point mysql/pdo
  • ...

  • You have to be sure your using utf8/ utf8mb4 everywhere!


mysql:

-mysql_query("SET NAMES 'utf8'"); Run SET NAMES before every query you use. Because if a mysql driver don't provied mechanismus to charset then you have to use SET NAMES!

-mysql_query("SET CHARACTER SET utf8 "); Set character to utf8

-mysql_set_charset('utf8'); Set your charset to utf8

-mysql API driver doesn't support utf8mb4 (ERROR 1115 (42000))

-character_set_server=utf8 to set server character

PDO:

-$dbh->exec("set names utf8"); If your using PDO you can use this line to SET NAMES

-$dbh = new PDO("mysql:host=$host;dbname=$db;charset=utf8"); This line set the charset but you have to have PHP 5.3.6 or higher

-$dbh->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_ci' "); You can also set SET NAMES with this line

-mb_internal_encoding('UTF-8'); to set the encoding when you use PDO


3. Change Charset and Collation of a Database or Table

If you have to change the charset or collation of a database or table you can use these lines of code:

ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;


4. Set the encoding of your skript files

You may have to check that your skript(php) files are saved with the right charset!

For this i would recommend you Notpad++!

If you have opened your file in notpad go to the menupoint 'Encoding' and change the charset


5. Set the charset of your page with php or meta tag

For displaying data in utf8/utf8mb4 you have to be sure you site is set with the right charset!

You can set the charset in 3 ways like this:

//PHP
ini_set("default_charset", "UTF-8");
header('Content-type: text/html; charset=UTF-8');

//HTML
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Also to accept utf8 in your form use:

<form accept-charset="UTF-8">


6. What's the difference between UTF8 and UTF8mb4?

UTF8:

-utf8 does only support symbols with 3 bytes

-...(many more)

UTF8MB4:

-utf8mb3 does support symbols with 4 bytes

-...(many more)


7. Answer to this specific Question

I think this should work since your using PDO:

(After you created a PDO object! If your using a PHP version less then 5.3.6)

$dbh->exec("set names utf8");

Otherwise try one of these:

ini_set("default_charset", "UTF-8");
header('Content-type: text/html; charset=UTF-8');

UPDATE:

To change the collation or charset of a database or table use this:

ALTER DATABASE databasename CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;


8. Further Information/Additional Links

  • default character set
  • character set
  • mysql_set_charset
  • error_reporting
  • pdo
  • mysql
  • mysqli


9. Side Notes

9.1 Error Reporting

If Error's not get displayed use this code snippet:

<?php
error_reporting(E_ALL);
ini_set("display_errors", 1);
?>

9.2 Unicode

So that you don't make any mistake you have to really understand utf8!

9.3 One word to mysql, mysqli and PDO

My Personal ranking is:

  1. PDO
  2. mysqli
  3. mysql

I would recommend you to use PDO or mysqli, because the have many benefits against mysql!

Fixing broken UTF-8 encoding

I've had to try to 'fix' a number of UTF8 broken situations in the past, and unfortunately it's never easy, and often rather impossible.

Unless you can determine exactly how it was broken, and it was always broken in that exact same way, then it's going to be hard to 'undo' the damage.

If you want to try to undo the damage, your best bet would be to start writing some sample code, where you attempt numerous variations on calls to mb_convert_encoding() to see if you can find a combination of 'from' and 'to' that fixes your data. In the end, it's often best to not even bother worrying about fixing the old data because of the pain levels involved, but instead to just fix things going forward.

However, before doing this, you need to make sure that you fix everything that is causing this issue in the first place. You've already mentioned that your DB table collation and editors are set properly. But there are more places where you need to check to make sure that everything is properly UTF-8:

  • Make sure that you are serving your HTML as UTF-8:
    • header("Content-Type: text/html; charset=utf-8");
  • Change your PHP default charset to utf-8:
    • ini_set("default_charset", 'utf-8');
  • If your database doesn't ALWAYS talk in utf-8, then you may need to tell it on a per connection basis to ensure it's in utf-8 mode, in MySQL you do that by issuing:
    • charset utf8
  • You may need to tell your webserver to always try to talk in UTF8, in Apache this command is:
    • AddDefaultCharset UTF-8
  • Finally, you need to ALWAYS make sure that you are using PHP functions that are properly UTF-8 complaint. This means always using the mb_* styled 'multibyte aware' string functions. It also means when calling functions such as htmlspecialchars(), that you include the appropriate 'utf-8' charset parameter at the end to make sure that it doesn't encode them incorrectly.

If you miss up on any one step through your whole process, the encoding can be mangled and problems arise. Once you get in the 'groove' of doing utf-8 though, this all becomes second nature. And of course, PHP6 is supposed to be fully unicode complaint from the getgo, which will make lots of this easier (hopefully)

MySQL database migration UTF-8 issues with PHP

To check for double encoding, use SELECT HEX(col)... é should come back C3A9 (proper utf8), but instead shows C383C2A9 (double encoding).

See: Trouble with UTF-8 characters; what I see is not what I stored

If you have actually determined that you have double encoding, then the fix involves

UPDATE tbl SET col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4);

See http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

Yes, "double encoding" is a silent bug -- two wrongs make a right (sort of).



Related Topics



Leave a reply



Submit