MySQL and PHP: UTF-8 with Cyrillic characters
You are mixing APIs here,
mysql_*
andmysqli_*
doesn't mix. You should stick withmysqli_
(as it seems you are anyway), asmysql_*
functions are deprecated, and removed entirely in PHP7.
Your actual issue is a charset problem somewhere. Here's a few pointers which can help you get the right charset for your application. This covers most of the general problems one can face when developing a PHP/MySQL application.
- ALL attributes throughout your application must be set to UTF-8
- Save the document as UTF-8 w/o BOM (If you're using Notepad++, it's
Format
->Convert to UTF-8 w/o BOM
) The header in both PHP and HTML should be set to UTF-8
HTML (inside
<head></head>
tags):<meta charset="UTF-8">
PHP (at the top of your file, before any output):
header('Content-Type: text/html; charset=utf-8');
Upon connecting to the database, set the charset to UTF-8 for your connection-object, like this (directly after connecting)
mysqli_set_charset($conn, "utf8"); /* Procedural approach */
$conn->set_charset("utf8"); /* Object-oriented approach */This is for
mysqli_*
, there are similar ones formysql_*
and PDO (see bottom of this answer).Also make sure your database and tables are set to UTF-8, you can do that like this:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;(Any data already stored won't be converted to the proper charset, so you'll need to do this with a clean database, or update the data after doing this if there are broken characters).
- If you're using
json_encode()
, you might need to apply theJSON_UNESCAPED_UNICODE
flag, otherwise it will convert special characters to their hexadecimal equivalent.
Remember that EVERYTHING in your entire pipeline of code needs to be set to UFT-8, otherwise you might experience broken characters in your application.
In addition to this list, there may be functions that has a specific parameter for specifying a charset. The manual will tell you about this (an example is htmlspecialchars()
).
There are also special functions for multibyte characters, example: strtolower()
won't lower multibyte characters, for that you'll have to use mb_strtolower()
, see this live demo.
Note 1: Notice that its someplace noted as
utf-8
(with a dash), and someplace asutf8
(without it). It's important that you know when to use which, as they usually aren't interchangeable. For example, HTML and PHP wantsutf-8
, but MySQL doesn't.Note 2: In MySQL, "charset" and "collation" is not the same thing, see Difference between Encoding and collation?. Both should be set to utf-8 though; generally collation should be either
utf8_general_ci
orutf8_unicode_ci
, see UTF-8: General? Bin? Unicode?.Note 3: If you're using emojis, MySQL needs to be specified with an
utf8mb4
charset instead of the standardutf8
, both in the database and the connection. HTML and PHP will just haveUTF-8
.
Setting UTF-8 with mysql_
and PDO
PDO: This is done in the DSN of your object. Note the
charset
attribute,$pdo = new PDO("mysql:host=localhost;dbname=database;charset=utf8", "user", "pass");
mysql_
: This is done very similar tomysqli_*
, but it doesn't take the connection-object as the first argument.mysql_set_charset('utf8');
How to encode cyrillic in mysql?
Make sure you call this after connecting to database.
mysql_query("SET NAMES UTF8");
Also make sure that HTML file has charset meta tag set to UTF-8 or send header before output.
header("Content-Type: text/html; charset=utf-8");
How to fix a cyrillic character/utf encoding issue in php
You must use `` (quote identifier) for sql queries.
SELECT * FROM `Вильгельм Телль`
Recommendation: do not use anything other ansi characters for table names, columns etc. You may face problems in other apps, cli etc.
POST Cyrillic letters PHP results in special characters
Have you tried mysql set charset?
<?php
$con=mysqli_connect("localhost","my_user","my_password","my_db");
// Check connection
if (mysqli_connect_errno())
{
echo "Failed to connect to MySQL: " . mysqli_connect_error();
}
// Change character set to utf8
mysqli_set_charset($con,"utf8");
mysqli_close($con);
?>
MySQL doesn't store cyrillic charsets
Ирина
is Mojibake for Ирина
.
When trying to use utf8/utf8mb4, if you see Mojibake, check the following.
This discussion also applies to Double Encoding, which is not necessarily visible.
- The bytes to be stored need to be utf8-encoded.
- The connection when
INSERTing
andSELECTing
text needs to specify utf8 or utf8mb4. (new PDO('...;charset=UTF8', ...);
) - The column needs to be declared
CHARACTER SET utf8
(or utf8mb4). - HTML should start with
<meta charset=UTF-8>
.
To check that the data was stored correctly, SELECT col, HEX(col) FROM ...
. The hex for utf8-encoding of Ирина
is D098 D180 D0B8 D0BD D0B0
. If, instead, you get C390 CB9C C391 E282AC C390 C2B8 C390 C2BD C390 C2B0
, then the INSERT
was messed up.
Insert Russian characters mysql
I guess you aren't checking the return value of mysqli::set_charset()
. It must be returning false
because utf-8
is not a valid encoding name in MySQL; the correct name is utf8
(no dash). Or, even better, utf8mb4
.
You can get a list of supported encodings with:
SHOW COLLATION;
UTF-8 all the way through
Data Storage:
Specify the
utf8mb4
character set on all tables and text columns in your database. This makes MySQL physically store and retrieve values encoded natively in UTF-8. Note that MySQL will implicitly useutf8mb4
encoding if autf8mb4_*
collation is specified (without any explicit character set).In older versions of MySQL (< 5.5.3), you'll unfortunately be forced to use simply
utf8
, which only supports a subset of Unicode characters. I wish I were kidding.
Data Access:
In your application code (e.g. PHP), in whatever DB access method you use, you'll need to set the connection charset to
utf8mb4
. This way, MySQL does no conversion from its native UTF-8 when it hands data off to your application and vice versa.Some drivers provide their own mechanism for configuring the connection character set, which both updates its own internal state and informs MySQL of the encoding to be used on the connection—this is usually the preferred approach. In PHP:
If you're using the PDO abstraction layer with PHP ≥ 5.3.6, you can specify
charset
in the DSN:$dbh = new PDO('mysql:charset=utf8mb4');
If you're using mysqli, you can call
set_charset()
:$mysqli->set_charset('utf8mb4'); // object oriented style
mysqli_set_charset($link, 'utf8mb4'); // procedural styleIf you're stuck with plain mysql but happen to be running PHP ≥ 5.2.3, you can call
mysql_set_charset
.
If the driver does not provide its own mechanism for setting the connection character set, you may have to issue a query to tell MySQL how your application expects data on the connection to be encoded:
SET NAMES 'utf8mb4'
.The same consideration regarding
utf8mb4
/utf8
applies as above.
Output:
- UTF-8 should be set in the HTTP header, such as
Content-Type: text/html; charset=utf-8
. You can achieve that either by settingdefault_charset
in php.ini (preferred), or manually usingheader()
function. - If your application transmits text to other systems, they will also need to be informed of the character encoding. With web applications, the browser must be informed of the encoding in which data is sent (through HTTP response headers or HTML metadata).
- When encoding the output using
json_encode()
, addJSON_UNESCAPED_UNICODE
as a second parameter.
Input:
- Browsers will submit data in the character set specified for the document, hence nothing particular has to be done on the input.
- In case you have doubts about request encoding (in case it could be tampered with), you may verify every received string as being valid UTF-8 before you try to store it or use it anywhere. PHP's
mb_check_encoding()
does the trick, but you have to use it religiously. There's really no way around this, as malicious clients can submit data in whatever encoding they want, and I haven't found a trick to get PHP to do this for you reliably.
Other Code Considerations:
Obviously enough, all files you'll be serving (PHP, HTML, JavaScript, etc.) should be encoded in valid UTF-8.
You need to make sure that every time you process a UTF-8 string, you do so safely. This is, unfortunately, the hard part. You'll probably want to make extensive use of PHP's
mbstring
extension.PHP's built-in string operations are not by default UTF-8 safe. There are some things you can safely do with normal PHP string operations (like concatenation), but for most things you should use the equivalent
mbstring
function.To know what you're doing (read: not mess it up), you really need to know UTF-8 and how it works on the lowest possible level. Check out any of the links from utf8.com for some good resources to learn everything you need to know.
Related Topics
How to Get an Array of Specific "Key" in Multidimensional Array Without Looping
Suppress Error With @ Operator in PHP
Calculate Number of Hours Between 2 Dates in PHP
500 Internal Server Error For PHP File Not For Html
MySQL - This Version of MySQL Doesn't Yet Support 'Limit & In/All/Any/Some Subquery
Apache Shows PHP Code Instead of Executing It
PHP Does Not Display Error Messages
Startswith() and Endswith() Functions in PHP
What Is the Most Accurate Way to Retrieve a User'S Correct Ip Address in PHP
The Ultimate Clean/Secure Function
Only Variables Should Be Passed by Reference
PHP Fatal Error: Using $This When Not in Object Context
Simplexml: Selecting Elements Which Have a Certain Attribute Value
Getting the Screen Resolution Using PHP
Why Can't I Access Datetime-≫Date in PHP'S Datetime Class