MySQL and PHP: Utf-8 With Cyrillic Characters

MySQL and PHP: UTF-8 with Cyrillic characters

You are mixing APIs here, mysql_* and mysqli_* doesn't mix. You should stick with mysqli_ (as it seems you are anyway), as mysql_* functions are deprecated, and removed entirely in PHP7.

Your actual issue is a charset problem somewhere. Here's a few pointers which can help you get the right charset for your application. This covers most of the general problems one can face when developing a PHP/MySQL application.

  • ALL attributes throughout your application must be set to UTF-8
  • Save the document as UTF-8 w/o BOM (If you're using Notepad++, it's Format -> Convert to UTF-8 w/o BOM)
  • The header in both PHP and HTML should be set to UTF-8

    • HTML (inside <head></head> tags):

      <meta charset="UTF-8">
    • PHP (at the top of your file, before any output):

      header('Content-Type: text/html; charset=utf-8');
  • Upon connecting to the database, set the charset to UTF-8 for your connection-object, like this (directly after connecting)

    mysqli_set_charset($conn, "utf8"); /* Procedural approach */
    $conn->set_charset("utf8"); /* Object-oriented approach */

    This is for mysqli_*, there are similar ones for mysql_* and PDO (see bottom of this answer).

  • Also make sure your database and tables are set to UTF-8, you can do that like this:

    ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

    (Any data already stored won't be converted to the proper charset, so you'll need to do this with a clean database, or update the data after doing this if there are broken characters).

  • If you're using json_encode(), you might need to apply the JSON_UNESCAPED_UNICODE flag, otherwise it will convert special characters to their hexadecimal equivalent.

Remember that EVERYTHING in your entire pipeline of code needs to be set to UFT-8, otherwise you might experience broken characters in your application.

In addition to this list, there may be functions that has a specific parameter for specifying a charset. The manual will tell you about this (an example is htmlspecialchars()).

There are also special functions for multibyte characters, example: strtolower() won't lower multibyte characters, for that you'll have to use mb_strtolower(), see this live demo.

Note 1: Notice that its someplace noted as utf-8 (with a dash), and someplace as utf8 (without it). It's important that you know when to use which, as they usually aren't interchangeable. For example, HTML and PHP wants utf-8, but MySQL doesn't.

Note 2: In MySQL, "charset" and "collation" is not the same thing, see Difference between Encoding and collation?. Both should be set to utf-8 though; generally collation should be either utf8_general_ci or utf8_unicode_ci, see UTF-8: General? Bin? Unicode?.

Note 3: If you're using emojis, MySQL needs to be specified with an utf8mb4 charset instead of the standard utf8, both in the database and the connection. HTML and PHP will just have UTF-8.


Setting UTF-8 with mysql_ and PDO

  • PDO: This is done in the DSN of your object. Note the charset attribute,

    $pdo = new PDO("mysql:host=localhost;dbname=database;charset=utf8", "user", "pass");
  • mysql_: This is done very similar to mysqli_*, but it doesn't take the connection-object as the first argument.

    mysql_set_charset('utf8');

How to encode cyrillic in mysql?

Make sure you call this after connecting to database.

mysql_query("SET NAMES UTF8");

Also make sure that HTML file has charset meta tag set to UTF-8 or send header before output.

header("Content-Type: text/html; charset=utf-8");

How to fix a cyrillic character/utf encoding issue in php

You must use `` (quote identifier) for sql queries.

SELECT * FROM `Вильгельм Телль`

Recommendation: do not use anything other ansi characters for table names, columns etc. You may face problems in other apps, cli etc.

POST Cyrillic letters PHP results in special characters

Have you tried mysql set charset?

<?php
$con=mysqli_connect("localhost","my_user","my_password","my_db");
// Check connection
if (mysqli_connect_errno())
{
echo "Failed to connect to MySQL: " . mysqli_connect_error();
}

// Change character set to utf8
mysqli_set_charset($con,"utf8");

mysqli_close($con);
?>

MySQL doesn't store cyrillic charsets

Ирина is Mojibake for Ирина.

When trying to use utf8/utf8mb4, if you see Mojibake, check the following.
This discussion also applies to Double Encoding, which is not necessarily visible.

  • The bytes to be stored need to be utf8-encoded.
  • The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. (new PDO('...;charset=UTF8', ...);)
  • The column needs to be declared CHARACTER SET utf8 (or utf8mb4).
  • HTML should start with <meta charset=UTF-8>.

To check that the data was stored correctly, SELECT col, HEX(col) FROM .... The hex for utf8-encoding of Ирина is D098 D180 D0B8 D0BD D0B0. If, instead, you get C390 CB9C C391 E282AC C390 C2B8 C390 C2BD C390 C2B0, then the INSERT was messed up.

Insert Russian characters mysql

I guess you aren't checking the return value of mysqli::set_charset(). It must be returning false because utf-8 is not a valid encoding name in MySQL; the correct name is utf8 (no dash). Or, even better, utf8mb4.

You can get a list of supported encodings with:

SHOW COLLATION;

UTF-8 all the way through

Data Storage:

  • Specify the utf8mb4 character set on all tables and text columns in your database. This makes MySQL physically store and retrieve values encoded natively in UTF-8. Note that MySQL will implicitly use utf8mb4 encoding if a utf8mb4_* collation is specified (without any explicit character set).

  • In older versions of MySQL (< 5.5.3), you'll unfortunately be forced to use simply utf8, which only supports a subset of Unicode characters. I wish I were kidding.

Data Access:

  • In your application code (e.g. PHP), in whatever DB access method you use, you'll need to set the connection charset to utf8mb4. This way, MySQL does no conversion from its native UTF-8 when it hands data off to your application and vice versa.

  • Some drivers provide their own mechanism for configuring the connection character set, which both updates its own internal state and informs MySQL of the encoding to be used on the connection—this is usually the preferred approach. In PHP:

    • If you're using the PDO abstraction layer with PHP ≥ 5.3.6, you can specify charset in the DSN:

       $dbh = new PDO('mysql:charset=utf8mb4');
    • If you're using mysqli, you can call set_charset():

        $mysqli->set_charset('utf8mb4');       // object oriented style
      mysqli_set_charset($link, 'utf8mb4'); // procedural style
    • If you're stuck with plain mysql but happen to be running PHP ≥ 5.2.3, you can call mysql_set_charset.

  • If the driver does not provide its own mechanism for setting the connection character set, you may have to issue a query to tell MySQL how your application expects data on the connection to be encoded: SET NAMES 'utf8mb4'.

  • The same consideration regarding utf8mb4/utf8 applies as above.

Output:

  • UTF-8 should be set in the HTTP header, such as Content-Type: text/html; charset=utf-8. You can achieve that either by setting default_charset in php.ini (preferred), or manually using header() function.
  • If your application transmits text to other systems, they will also need to be informed of the character encoding. With web applications, the browser must be informed of the encoding in which data is sent (through HTTP response headers or HTML metadata).
  • When encoding the output using json_encode(), add JSON_UNESCAPED_UNICODE as a second parameter.

Input:

  • Browsers will submit data in the character set specified for the document, hence nothing particular has to be done on the input.
  • In case you have doubts about request encoding (in case it could be tampered with), you may verify every received string as being valid UTF-8 before you try to store it or use it anywhere. PHP's mb_check_encoding() does the trick, but you have to use it religiously. There's really no way around this, as malicious clients can submit data in whatever encoding they want, and I haven't found a trick to get PHP to do this for you reliably.

Other Code Considerations:

  • Obviously enough, all files you'll be serving (PHP, HTML, JavaScript, etc.) should be encoded in valid UTF-8.

  • You need to make sure that every time you process a UTF-8 string, you do so safely. This is, unfortunately, the hard part. You'll probably want to make extensive use of PHP's mbstring extension.

  • PHP's built-in string operations are not by default UTF-8 safe. There are some things you can safely do with normal PHP string operations (like concatenation), but for most things you should use the equivalent mbstring function.

  • To know what you're doing (read: not mess it up), you really need to know UTF-8 and how it works on the lowest possible level. Check out any of the links from utf8.com for some good resources to learn everything you need to know.



Related Topics



Leave a reply



Submit