How to Convert Latin1_Swedish_Ci Data into Utf8_General_Ci

Converting latin1_swedish_ci to utf8 with PHP

The collation is NOT the same as the character set. The collation is only used for sorting and comparison of text (that's why there's a language term in there). The actual character set may be different.

The most common failure is not in the database but rather in the connection between PHP and MySQL. The default charset for the connection is usually ISO-8859-1. You need to change that the first thing you do after connecting, using either the SQL query SET NAMES 'utf-8'; or the mysql_set_charset function.

Also check the character set of your tables. This may be wrong as well if you have not specified UTF-8 to begin with (again: this is not the same as the collation). But make sure to take a backup before changing anything here. MySQL will try to convert the charset from the previous one, so you may need to reload the data from backup if you have actually saved UTF-8 data in ISO-8859-1 tables.

change encoding of database from latin1_swedish_ci to utf8

If you have any stored procedures that use database defaults, you must drop and recreate those stored procedures.

If the ALTER DATABASE command runs quickly, chances are the tables were not altered.

You may have to run ALTER TABLE on all your tables as follows:

ALTER TABLE tblname CONVERT TO CHARACTER SET charset_name [COLLATE collation_name];

How to change collation of all rows from latin1_swedish_ci to utf8_unicode_ci?

If the columns are using the default table character set then it's just one query per table to convert:

ALTER TABLE t CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

If the character set is set individually on each column, AFAIK there is no way to do that on all columns of all tables in the database directly in MySql, but you could write a tiny program in your language of choice that does so.

Your program would query the INFORMATION_SCHEMA.COLUMNS table and look at the CHARACTER_SET_NAME column:

SELECT * FROM `INFORMATION_SCHEMA.COLUMNS`
WHERE TABLE_SCHEMA = 'dbname' AND CHARACTER_SET_NAME = 'latin1'

For each result row it's trivial to synthesize and execute an ALTER TABLE query on the spot that changes the character set and collation appropriately:

ALTER TABLE t MODIFY col TEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci;

In the above query t, col and TEXT would be the values of the TABLE_NAME, COLUMN_NAME and DATA_TYPE columns from the INFORMATION_SCHEMA.COLUMNS result set.

Mysql changing the collation from latin1_swedish_ci to utf8mb4_bin

You are in for a rocky road. Hex E4 has nothing to do with Psi. Where did it come from?

Do SELECT title, HEX(title) ... on some title that has a non-Ascii character in it.

The UTF-8 encoding for psi is two hex characters CEA8.

E4, when interpreted as latin1 represents ä. Does that make any sense?

  • The column needs to be changed to CHARACTER SET utf8 (or utf8mb4). But there is a right way to do that and a way that messes things up worse. Let's see that HEX before discussing which is right.
  • The VARIABLEs that control what encoding the client is using. There are multiple settings that need to change when switching encodings.

More references:

  • http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
  • Trouble with UTF-8 characters; what I see is not what I stored

No charset maps E4 to Psi:

cp1250, cp1257, dec8, latin1, latin2, latin5, latin7  25 24
'Kaze no Tani no Nausicaä'
cp852 25 24 'Kaze no Tani no Nausicań'
cp850 25 24 'Kaze no Tani no Nausicaõ'
macce 25 24 'Kaze no Tani no Nausicaš'
hp8 25 24 'Kaze no Tani no Nausicað'
greek 25 24 'Kaze no Tani no Nausicaδ'
keybcs2 25 24 'Kaze no Tani no NausicaΣ'
cp1251 25 24 'Kaze no Tani no Nausicaд'
koi8r, koi8u 25 24 'Kaze no Tani no NausicaД'
cp866 25 24 'Kaze no Tani no Nausicaф'
armscii8 25 24 'Kaze no Tani no NausicaՊ'
hebrew 25 24 'Kaze no Tani no Nausicaה'
cp1256 25 24 'Kaze no Tani no Nausicaن'
tis620 26 24 'Kaze no Tani no Nausicaไ'
geostd8 26 24 'Kaze no Tani no Nausicaჰ'
macroman 26 24 'Kaze no Tani no Nausica‰'

So, I worry that two mistakes have been made. Do you have another example of mangled text?

Latin1 cannot handle Greek letters. Nor Cyrillic. Nor Chinese. Etc. So, ä is "correct"? (I have been chasing how to get between E4 and Psi.)

So you should probably convert to utf8mb4.

ALTER TABLE t CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;

will convert all the text columns in table t.

To change just one column:

ALTER TABLE t MODIFY COLUMN c VARCHAR(...)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;

It must contain all the other stuff you have not (eg, NULL or NOT NULL).



Related Topics



Leave a reply



Submit