A script to change all tables and fields to the utf-8-bin collation in MYSQL
Be careful! If you actually have utf stored as another encoding, you could have a real mess on your hands. Back up first. Then try some of the standard methods:
for instance
http://www.cesspit.net/drupal/node/898
http://www.hackszine.com/blog/archive/2007/05/mysql_database_migration_latin.html
I've had to resort to converting all text fields to binary, then back to varchar/text. This has saved my ass.
I had data is UTF8, stored as latin1. What I did:
Drop indexes.
Convert fields to binary.
Convert to utf8-general ci
If your on LAMP, don’t forget to add set NAMES command before interacting with the db, and make sure you set character encoding headers.
How to convert all tables in database to one collation?
You need to execute a alter table statement for each table. The statement would follow this form:
ALTER TABLE tbl_name
[[DEFAULT] CHARACTER SET charset_name]
[COLLATE collation_name]
Now to get all the tables in the database you would need to execute the following query:
SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA="YourDataBaseName"
AND TABLE_TYPE="BASE TABLE";
So now let MySQL write the code for you:
SELECT CONCAT("ALTER TABLE ", TABLE_SCHEMA, '.', TABLE_NAME," COLLATE your_collation_name_here;") AS ExecuteTheString
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA="YourDatabaseName"
AND TABLE_TYPE="BASE TABLE";
You can copy the results and execute them. I have not tested the syntax but you should be able to figure out the rest. Think of it as a little exercise.
How to change collation of database, table, column?
You need to either convert each table individually:
ALTER TABLE mytable CONVERT TO CHARACTER SET utf8mb4
(this will convert the columns just as well), or export the database with latin1
and import it back with utf8mb4
.
How to convert an entire MySQL database characterset and collation to UTF-8?
Use the ALTER DATABASE
and ALTER TABLE
commands.
ALTER DATABASE databasename CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Or if you're still on MySQL 5.5.2 or older which didn't support 4-byte UTF-8, use utf8
instead of utf8mb4
:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
How to change the CHARACTER SET (and COLLATION) throughout a database?
change database collation:
ALTER DATABASE <database_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
change table collation:
ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
change column collation:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
What do the parts of utf8mb4_0900_ai_ci
mean?
3 bytes -- utf8
4 bytes -- utf8mb4 (new)
v4.0 -- _unicode_
v5.20 -- _unicode_520_
v9.0 -- _0900_ (new)
_bin -- just compare the bits; don't consider case folding, accents, etc
_ci -- explicitly case insensitive (A=a) and implicitly accent insensitive (a=á)
_ai_ci -- explicitly case insensitive and accent insensitive
_as (etc) -- accent-sensitive (etc)
_bin -- simple, fast
_general_ci -- fails to compare multiletters; eg ss=ß, somewhat fast
... -- slower
_0900_ -- (8.0) much faster because of a rewrite
More info:
- What are the differences between utf8_general_ci and utf8_unicode_ci?
- What's the difference between utf8_general_ci and utf8_unicode_ci?
- How to change collation of database, table, column?
- What's the difference between utf8_general_ci and utf8_unicode_ci?
How to make MySQL handle UTF-8 properly
Update:
Short answer - You should almost always be using the utf8mb4
charset and utf8mb4_unicode_ci
collation.
To alter database:
ALTER DATABASE dbname CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
See:
Aaron's comment on this answer How to make MySQL handle UTF-8 properly
What's the difference between utf8_general_ci and utf8_unicode_ci
Conversion guide: https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html
Original Answer:
MySQL 4.1 and above has a default character set of UTF-8. You can verify this in your my.cnf
file, remember to set both client and server (default-character-set
and character-set-server
).
If you have existing data that you wish to convert to UTF-8, dump your database, and import it back as UTF-8 making sure:
- use
SET NAMES utf8
before you query/insert into the database - use
DEFAULT CHARSET=utf8
when creating new tables - at this point your MySQL client and server should be in UTF-8 (see
my.cnf
). remember any languages you use (such as PHP) must be UTF-8 as well. Some versions of PHP will use their own MySQL client library, which may not be UTF-8 aware.
If you do want to migrate existing data remember to backup first! Lots of weird choping of data can happen when things don't go as planned!
Some resources:
- complete UTF-8 migration (cdbaby.com)
- article on UTF-8 readiness of php functions (note some of this information is outdated)
Related Topics
PHP Preg_Match - Only Allow Alphanumeric Strings and - _ Characters
Is There Open-Source Code for Making 'Link Preview' Text and Icons, Like in Facebook
Add a Checkout Checkbox Field That Enable a Percentage Fee in Woocommerce
Should I Mix Angularjs with a PHP Framework
Checking for Empty Arrays: Count VS Empty
Symfony2 - How to Switch from "Dev" to "Prod"
HTML Table Using MySQLi and PHP
Check If a PHP Cookie Exists and If Not Set Its Value
Find Duplicate Value in Multi-Dimensional Array
What Http Status Code Is Supposed to Be Used to Tell the Client the Session Has Timed Out
Php: How to Get a List of Classes That Implement Certain Interface
Combining and Compressing Multiple JavaScript Files in PHP
PHP How to Go One Level Up on Dirname(_File_)