Strange Character Encoding of Stored Data , Old Script Is Showing Them Fine New One Doesn'T

strange character encoding of stored data , old script is showing them fine new one doesn't

In short, because this has been discussed a thousand times before:

  1. PHP holds a string, say "漢字", encoded in UTF-8. The bytes for this are E6 BC A2 E5 AD 97.
  2. It sends this string over a database connection which is set to latin1.
  3. The database receives the bytes E6 BC A2 E5 AD 97, thinking those represent latin1 characters.
  4. The database stores the characters æ¼¢å­ (the characters that E6 BC A2 E5 AD 97 maps to in latin1).
  5. The same process reversed makes PHP receive the same bytes, which it then treats as UTF-8. The roundtrip works fine for PHP, even though the database doesn't treat the characters as it should.

So the problem here was that the database connection was set incorrectly when the data was entered into the database. You'll have to convert the data in the database to the correct characters. Try this:

SELECT CONVERT(BINARY CONVERT(field_name USING latin1) USING utf8) FROM table_name

Maybe utf8 isn't what you need here, experiment. If that works, change this into an UPDATE statement to update the data permanently.

MySQL data has wrong encoding when displaying using PHP

OK. Comments by sbondo1234 and Tausif helped me to realize that the new records work just fine. Only the old data in the database is messed up. It was easier for me to create a simple matrix and pair incorrectly encoded characters with correct ones and then update them with simple SQL queries. Thanks for your help.

Encoding error with polish charset during transfer of database / server seting up

At the end I have founded out that the problem was related to the fact that the data was written to SQL incorrectly in my original server.

I ended up with transferring DB using:

mysqldump --default-character-set=utf8 [ORYGINAL_DB] | mysql [TARGET_DB] --default-character-set=utf8

and the executing:

UPDATE [table name] SET [field] = CONVERT(BINARY CONVERT([field] USING latin2) USING utf8)

as it was advices here:

strange character encoding of stored data , old script is showing them fine new one doesn't

Hope that the above solution will be helpful for others too.



Related Topics



Leave a reply



Submit