How to Make MySQL Return Utf-8

How do I make MySQL return UTF-8?

You have to define the connection to your database as UTF-8:

// Set up your connection
$connection = mysql_connect('localhost', 'user', 'pw');
mysql_select_db('yourdb', $connection);
mysql_query("SET NAMES 'utf8'", $connection);

// Now you get UTF-8 encoded stuff
$query = sprintf('SELECT name FROM place where id = 1');
$result = mysql_query($query, $connection);
$result = mysql_fetch_assoc($result);

Convert output of MySQL query to utf8

You can use CAST and CONVERT to switch between different types of encodings. See: http://dev.mysql.com/doc/refman/5.0/en/charset-convert.html

SELECT column1, CONVERT(column2 USING utf8)
FROM my_table
WHERE my_condition;

How to make MySQL handle UTF-8 properly

Update:

Short answer - You should almost always be using the utf8mb4 charset and utf8mb4_unicode_ci collation.

To alter database:

ALTER DATABASE dbname CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

See:

  • Aaron's comment on this answer How to make MySQL handle UTF-8 properly

  • What's the difference between utf8_general_ci and utf8_unicode_ci

  • Conversion guide: https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html

Original Answer:

MySQL 4.1 and above has a default character set of UTF-8. You can verify this in your my.cnf file, remember to set both client and server (default-character-set and character-set-server).

If you have existing data that you wish to convert to UTF-8, dump your database, and import it back as UTF-8 making sure:

  • use SET NAMES utf8 before you query/insert into the database
  • use DEFAULT CHARSET=utf8 when creating new tables
  • at this point your MySQL client and server should be in UTF-8 (see my.cnf). remember any languages you use (such as PHP) must be UTF-8 as well. Some versions of PHP will use their own MySQL client library, which may not be UTF-8 aware.

If you do want to migrate existing data remember to backup first! Lots of weird choping of data can happen when things don't go as planned!

Some resources:

  • complete UTF-8 migration (cdbaby.com)
  • article on UTF-8 readiness of php functions (note some of this information is outdated)

PHP MySQL utf 8 encoding

Set the connection to use UTF-8:

<?php

// MySQLi:

$connection = new MySQLi( /* ... credentials ...*/);
$connection->set_charset("utf8");

// MySQL:
$connection = mysql_connect(/* ... credentials ... */);
mysql_set_charset("utf8", $connection);

?>

reading utf-8 content from mysql table

Four good steps to always get correctly encoded UTF-8 text:

1) Run this query before any other query:

 mysql_query("set names 'utf8'");

2) Add this to your HTML head:

 <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

3) Add this at top of your PHP code:

 header("Content-Type: text/html;charset=UTF-8");

4) Save your file with UTF-8 without BOM encoding using Notepad++ or any other good text-editor / IDE.

Getting UTF-8 strings from MySQL using PHP

Thanks Deceze, the culprit ended up being an htmlentities call that needed to be replaced with:

htmlspecialchars($row['col'], ENT_QUOTES, "UTF-8");

In the end I just misread my own code. After all this time it was something so trivial. Frustrating, but glad to have found the solution.

Thanks for all your help.

MySQL: character encoding used by SELECT INTO?

Many programs/standards (including MySQL) assume that "latin1" means "cp1252", so the 0x80 byte is interpreted as a Euro symbol, which is where that \xe2\x82\xac bit (U+20AC) comes from in the middle.

When I try this, it works properly (but note how I put data in, and the variables set on the db server):

mysql> set names utf8; -- http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
mysql> create table sq (c varchar(10)) character set utf8;
mysql> show create table sq\G
*************************** 1. row ***************************
Table: sq
Create Table: CREATE TABLE `sq` (
`c` varchar(10) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.19 sec)

mysql> insert into sq values (unhex('E2809C'));
Query OK, 1 row affected (0.00 sec)

mysql> select hex(c), c from sq;
+--------+------+
| hex(c) | c |
+--------+------+
| E2809C | “ |
+--------+------+
1 row in set (0.00 sec)

mysql> select * from sq into outfile '/tmp/x.csv';
Query OK, 1 row affected (0.02 sec)

mysql> show variables like "%char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

And from the shell:

/tmp$ hexdump -C x.csv
00000000 e2 80 9c 0a |....|
00000004

Hopefully there's a useful tidbit in there…

Detailed instructions on converting a MYSQL DB and its data from latin to UTF-8. Too much diff info out there

The are two different problems which are often conflated:

  1. change the specification of a table or column on how it should store data internally
  2. convert garbled mojibake data to its intended characters

Each text column in MySQL has an associated charset attribute, which specifies what encoding text stored in this column should be stored as internally. This only really influences what characters can be stored in this column and how efficient the data storage is. For example, if you're storing a ton of Japanese text, sjis as an encoding may be a lot more efficient than utf8 and save you a bit of disk space.

The column encoding does not in any way influence in what encoding data is input and output to/from the database. This is a separate setting, the connection encoding, which is established for every individual client every time you connect to the database. MySQL will convert data on the fly between the connection encoding and the column/table charset as needed. You can connect to the database with a utf8 connection, send it Japanese text destined for an sjis column, and MySQL will convert from utf8 to sjis on the fly (and back in reverse on the way out).

Now, if you've screwed up the connection encoding (as happens way too often) and you've inserted text in a different encoding than your connection encoding specified (e.g. your connection encoding was latin1 but you actually sent UTF-8 encoded data), then you're storing garbage in your database and you need to recover that. If that's your issue, see How to convert wrongly encoded data to UTF-8?.

However, if all your data is peachy and all you want to do is tell MySQL to store data in a different encoding from now on, you only need this:

ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

MySQL will convert the current data from its current charset to the new charset and store future data in the new charset. That's all.

Get UTF-8 string from MySQL via PHP returned ????????

Be sure your table is in utf8_general_ci.
Database and tables can have a different charset
(can happen when you change it by yourself).

And be careful with some functions, like htmlentities,
which have a default charset parameter set to ISO-8859-1 for PHP <5.4.0
(set to UTF-8 for PHP >5.4.0)

http://php.net/manual/en/function.htmlentities.php



Related Topics



Leave a reply



Submit