Convert latin1 characters on a UTF8 table into UTF8
From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you'll need a MySQL function like
convert(cast(convert(name using latin1) as binary) using utf8)
It's possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.
converting latin1 data into utf8 inside of an existing database
Following this answer:
MySQL - Convert latin1 characters on a UTF8 table into UTF8
you can make a function:
CONVERT(CAST(CONVERT(name USING latin1) AS binary) USING utf8)
and apply it.
How to convert mysql latin1 to utf8
I managed to solve it by running updates on text fields like this:
UPDATE table SET title = CONVERT(CONVERT(CONVERT(title USING latin1) USING binary) USING UTF8)
MySQL: data being mangled while changing column to UTF8
F1 and FA are latin1 encodings. You need to tell MySQL that the data is latin1
. One way is via SET NAMES latin1
.
But note... That is independent of the setting for the column you are trying to store the data into. And, these days, utf8mb4 is the preferred setting for text. MySQL will convert between the column's encoding and the client's encoding. But you must tell it the client's encoding via connection parameters (or SET NAMES
).
The pair of ALTER TABLEs
works for certain situations, not all situations! You probably wanted the first entry in http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
Table is CHARACTER SET latin1 and correctly encoded in latin1; want
utf8mb4:ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4;
I don't happen to know if your data is irreparably hosed. Please provide one of the lines, together with HEX.
Hex
"Larrasoaña" is encoded as 4C61727261736F61F161, and "Jesús y María" as 4A6573FA732079204D6172ED6120
Those are latin1-encoded (or latin5 or dec8). If the table definition (SHOW CREATE TABLE
) says latin1, then you could leave it alone. (latin1
handles Western European languages, but not Asian.)
If you want to convert all the text columns to utf8 or utf8mb4, do an ALTER
like the one I presented above. Your 3-Alter approach will not work correctly; it assumes the bytes in the latin1 column are really UTF-8 bytes (which they aren't).
But... You must specify the client's encoding based on what the client wants. And it does not matter whether the client and the table agree since conversion will be provided.
Why the 3-step Alter fails
ALTER TABLE clientes CHARACTER SET utf8;
-- This sets the default charset for new columns. It has no effect on the existing column definitions and any data in those columns.
ALTER TABLE clientes change nombre nombre varbinary(255);
-- This says "forget about any text encoding". That is F1
is now just a bunch of bits, not the latin1 representation for ñ
.
ALTER TABLE clientes change nombre nombre varchar(255) character set utf8;
-- This takes those varbinary bits and says "let's treat them as utf8
. And that gives the error message because F1 is not a valid encoding for utf8.
That procedure is appropriate if the bytes are already utf8 bytes. That is, if it were already the 2-byte C3B1
for ñ
. (By the way, this usually manifests itself as 'Mojibake', displaying as ñ
when interpreted as latin1.)
The 1-Alter procedure...
ALTER TABLE clientes CONVERT TO CHARACTER SET utf8;
(to convert the entire table) or ALTER TABLE clientes MODIFY nombre varchar(255) character set utf8;
(to convert just one column). They do the following things:
For each text (char/varchar/text) column, it reads the data according to its current encoding (latin1, F1), converts it to utf8 (or utf8mb4) (C3B1) and writes back into the row. Meanwhile, it has changed the declaration to be CHARACTER SET utf8
.
That is, it is the 'right' process for changing the CHARACTER SET
without changing the "text". True, the encoding changed (F1 -> C3B1), but that is in keeping with the change to the CHARACTER SET
.
Recovery
Your first 2 ALTERs worked, correct? Did the 3rd one succeed, fail, or leave a messed up table?
If it aborted, leaving varbinary
in place, then do 2 more alters: First go back to latin1; then go straight to utf8.
If it left you with a messed up column, especially if rows are truncated, then you need to go back to a backup, or otherwise reload the data.
Related Topics
Why Does the PHP Json_Encode Function Convert Utf-8 Strings to Hexadecimal Entities
In PHP, What Does "≪≪≪" Represent
Sanitizing Strings to Make Them Url and Filename Safe
PHP, Getting Variable from Another PHP-File
Prevent Sent Emails Treated as Junk Mails Using PHP Mail Function
How to Hide/Encode/Encrypt PHP Source Code and Let Others Have the System
Byethost Server Passing HTML Values "Checking Your Browser" With Json String
The Openssl Extension Is Required For Ssl/Tls Protection
What Do Strict Types Do in PHP
Submit an HTML Form With Empty Checkboxes
How to Insert an Item At the Beginning of an Array in PHP
Why Are $_Post Variables Getting Escaped in PHP