Converting latin1_swedish_ci to utf8 with PHP
The collation is NOT the same as the character set. The collation is only used for sorting and comparison of text (that's why there's a language term in there). The actual character set may be different.
The most common failure is not in the database but rather in the connection between PHP and MySQL. The default charset for the connection is usually ISO-8859-1. You need to change that the first thing you do after connecting, using either the SQL query SET NAMES 'utf-8';
or the mysql_set_charset function.
Also check the character set of your tables. This may be wrong as well if you have not specified UTF-8 to begin with (again: this is not the same as the collation). But make sure to take a backup before changing anything here. MySQL will try to convert the charset from the previous one, so you may need to reload the data from backup if you have actually saved UTF-8 data in ISO-8859-1 tables.
change encoding of database from latin1_swedish_ci to utf8
If you have any stored procedures that use database defaults, you must drop and recreate those stored procedures.
If the ALTER DATABASE command runs quickly, chances are the tables were not altered.
You may have to run ALTER TABLE on all your tables as follows:
ALTER TABLE tblname CONVERT TO CHARACTER SET charset_name [COLLATE collation_name];
How to change collation of all rows from latin1_swedish_ci to utf8_unicode_ci?
If the columns are using the default table character set then it's just one query per table to convert:
ALTER TABLE t CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
If the character set is set individually on each column, AFAIK there is no way to do that on all columns of all tables in the database directly in MySql, but you could write a tiny program in your language of choice that does so.Your program would query the INFORMATION_SCHEMA.COLUMNS
table and look at the CHARACTER_SET_NAME
column:
SELECT * FROM `INFORMATION_SCHEMA.COLUMNS`
WHERE TABLE_SCHEMA = 'dbname' AND CHARACTER_SET_NAME = 'latin1'
For each result row it's trivial to synthesize and execute an ALTER TABLE
query on the spot that changes the character set and collation appropriately:ALTER TABLE t MODIFY col TEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci;
In the above query t
, col
and TEXT
would be the values of the TABLE_NAME
, COLUMN_NAME
and DATA_TYPE
columns from the INFORMATION_SCHEMA.COLUMNS
result set. Mysql changing the collation from latin1_swedish_ci to utf8mb4_bin
You are in for a rocky road. Hex E4
has nothing to do with Psi
. Where did it come from?
Do SELECT title, HEX(title) ...
on some title that has a non-Ascii character in it.
The UTF-8 encoding for psi is two hex characters CEA8
.
E4
, when interpreted as latin1
represents ä
. Does that make any sense?
- The column needs to be changed to
CHARACTER SET utf8
(orutf8mb4
). But there is a right way to do that and a way that messes things up worse. Let's see that HEX before discussing which is right. - The
VARIABLEs
that control what encoding the client is using. There are multiple settings that need to change when switching encodings.
- http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
- Trouble with UTF-8 characters; what I see is not what I stored
cp1250, cp1257, dec8, latin1, latin2, latin5, latin7 25 24
'Kaze no Tani no Nausicaä'
cp852 25 24 'Kaze no Tani no Nausicań'
cp850 25 24 'Kaze no Tani no Nausicaõ'
macce 25 24 'Kaze no Tani no Nausicaš'
hp8 25 24 'Kaze no Tani no Nausicað'
greek 25 24 'Kaze no Tani no Nausicaδ'
keybcs2 25 24 'Kaze no Tani no NausicaΣ'
cp1251 25 24 'Kaze no Tani no Nausicaд'
koi8r, koi8u 25 24 'Kaze no Tani no NausicaД'
cp866 25 24 'Kaze no Tani no Nausicaф'
armscii8 25 24 'Kaze no Tani no NausicaՊ'
hebrew 25 24 'Kaze no Tani no Nausicaה'
cp1256 25 24 'Kaze no Tani no Nausicaن'
tis620 26 24 'Kaze no Tani no Nausicaไ'
geostd8 26 24 'Kaze no Tani no Nausicaჰ'
macroman 26 24 'Kaze no Tani no Nausica‰'
So, I worry that two mistakes have been made. Do you have another example of mangled text?Latin1 cannot handle Greek letters. Nor Cyrillic. Nor Chinese. Etc. So, ä
is "correct"? (I have been chasing how to get between E4 and Psi.)
So you should probably convert to utf8mb4.
ALTER TABLE t CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
will convert all the text columns in table t
.To change just one column:
ALTER TABLE t MODIFY COLUMN c VARCHAR(...)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
It must contain all the other stuff you have not (eg, NULL or NOT NULL).
Related Topics
Programming Multi-Language PHP Applications
Iconv_Strlen Function Causing Execution Timeout, Running on Mamp
How to Retrieve Date from Table Cell Using PHPspreadsheet
Php: How to Read a File Live That Is Constantly Being Written To
Use MySQL_Fetch_Array() with Foreach() Instead of While()
How to Access Magento Customer's Session from Outside Magento
Laravel Errno 150 Foreign Key Constraint Is Incorrectly Formed
PHP Class Not Found But It's Included
Storing Leading Zeros of Integers in MySQL Database as Integer
What Does the Wordpress "_E()" Function Do
Convert Windows Timestamp to Date Using PHP on a Linux Box
Type Hinting: Default Parameters
Add a Custom Checkbox in Woocommerce Checkout Which Value Shows in Admin Edit Order