A Script to Change All Tables and Fields to the Utf-8-Bin Collation in MySQL

A script to change all tables and fields to the utf-8-bin collation in MYSQL

Be careful! If you actually have utf stored as another encoding, you could have a real mess on your hands. Back up first. Then try some of the standard methods:

for instance
http://www.cesspit.net/drupal/node/898
http://www.hackszine.com/blog/archive/2007/05/mysql_database_migration_latin.html

I've had to resort to converting all text fields to binary, then back to varchar/text. This has saved my ass.

I had data is UTF8, stored as latin1. What I did:

Drop indexes.
Convert fields to binary.
Convert to utf8-general ci

If your on LAMP, don’t forget to add set NAMES command before interacting with the db, and make sure you set character encoding headers.

How to convert all tables in database to one collation?

You need to execute a alter table statement for each table. The statement would follow this form:

ALTER TABLE tbl_name
[[DEFAULT] CHARACTER SET charset_name]
[COLLATE collation_name]

Now to get all the tables in the database you would need to execute the following query:

SELECT * 
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA="YourDataBaseName"
AND TABLE_TYPE="BASE TABLE";

So now let MySQL write the code for you:

SELECT CONCAT("ALTER TABLE ", TABLE_SCHEMA, '.', TABLE_NAME," COLLATE your_collation_name_here;") AS    ExecuteTheString
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA="YourDatabaseName"
AND TABLE_TYPE="BASE TABLE";

You can copy the results and execute them. I have not tested the syntax but you should be able to figure out the rest. Think of it as a little exercise.

How to change collation of database, table, column?

You need to either convert each table individually:

ALTER TABLE mytable CONVERT TO CHARACTER SET utf8mb4 

(this will convert the columns just as well), or export the database with latin1 and import it back with utf8mb4.

How to convert an entire MySQL database characterset and collation to UTF-8?

Use the ALTER DATABASE and ALTER TABLE commands.

ALTER DATABASE databasename CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Or if you're still on MySQL 5.5.2 or older which didn't support 4-byte UTF-8, use utf8 instead of utf8mb4:

ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

How to change the CHARACTER SET (and COLLATION) throughout a database?

change database collation:

ALTER DATABASE <database_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

change table collation:

ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

change column collation:

ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;


What do the parts of utf8mb4_0900_ai_ci mean?

3 bytes -- utf8
4 bytes -- utf8mb4 (new)
v4.0 --   _unicode_
v5.20 -- _unicode_520_
v9.0 -- _0900_ (new)
_bin      -- just compare the bits; don't consider case folding, accents, etc
_ci -- explicitly case insensitive (A=a) and implicitly accent insensitive (a=á)
_ai_ci -- explicitly case insensitive and accent insensitive
_as (etc) -- accent-sensitive (etc)
_bin         -- simple, fast
_general_ci -- fails to compare multiletters; eg ss=ß, somewhat fast
... -- slower
_0900_ -- (8.0) much faster because of a rewrite

More info:

  • What are the differences between utf8_general_ci and utf8_unicode_ci?
  • What's the difference between utf8_general_ci and utf8_unicode_ci?
  • How to change collation of database, table, column?
  • What's the difference between utf8_general_ci and utf8_unicode_ci?

How to make MySQL handle UTF-8 properly

Update:

Short answer - You should almost always be using the utf8mb4 charset and utf8mb4_unicode_ci collation.

To alter database:

ALTER DATABASE dbname CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

See:

  • Aaron's comment on this answer How to make MySQL handle UTF-8 properly

  • What's the difference between utf8_general_ci and utf8_unicode_ci

  • Conversion guide: https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html

Original Answer:

MySQL 4.1 and above has a default character set of UTF-8. You can verify this in your my.cnf file, remember to set both client and server (default-character-set and character-set-server).

If you have existing data that you wish to convert to UTF-8, dump your database, and import it back as UTF-8 making sure:

  • use SET NAMES utf8 before you query/insert into the database
  • use DEFAULT CHARSET=utf8 when creating new tables
  • at this point your MySQL client and server should be in UTF-8 (see my.cnf). remember any languages you use (such as PHP) must be UTF-8 as well. Some versions of PHP will use their own MySQL client library, which may not be UTF-8 aware.

If you do want to migrate existing data remember to backup first! Lots of weird choping of data can happen when things don't go as planned!

Some resources:

  • complete UTF-8 migration (cdbaby.com)
  • article on UTF-8 readiness of php functions (note some of this information is outdated)


Related Topics



Leave a reply



Submit