Converting BLOB's to UTF-8 in MySQL
Blobs have no character set nor collation (the manual page points to varbinary where this is explained). The equivalent of a blob, but with character set and collation is a TEXT column (same manual page as the blob).
phpMyAdmin seems to be, to say the least, unclear in its operation. I have not been able to find a related bug report.
How to convert an entire MySQL database characterset and collation to UTF-8?
Use the ALTER DATABASE
and ALTER TABLE
commands.
ALTER DATABASE databasename CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Or if you're still on MySQL 5.5.2 or older which didn't support 4-byte UTF-8, use utf8
instead of utf8mb4
:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
SQLAlchemy/MySQL binary blob is being utf-8 encoded?
Turns out that this was a driver issue. Apparently the default MySQL driver stumbles with Py3 and utf8 support. Installing cymysql into the virtual Python environment resolved this problem and the warnings disappear.
The fix: Find out if MySQL connects through socket or port (see here), and then modify the connection string accordingly. In my case using a socket connection:
mysql+cymysql://user:pwd@localhost/database?unix_socket=/var/run/mysqld/mysqld.sock
Use the port
argument otherwise.
Edit: While the above fixed the encoding issue, it gave rise to another one: blob size. Due to a bug in CyMySQL blobs larger than 8M fail to commit. Switching to PyMySQL fixed that problem, although it seems to have a similar issue with large blobs.
MySQL character encoding change. Is data integrity preserved?
Every (character string-type) column has its own character set and collation metadata.
If, when the column's data type was specified (i.e. when it was last created or altered), no character set/collation was explicitly given, then the table's default character set and collation would be used for the column.
If, when the table was specified, no default character set/collation was explicitly given, then the database's default character set and collation would be used for the table's default.
The commands that you quote in your question merely alter such default character sets/collations for the database and table respectively. In other words, they will only affect tables and columns that are created thereafter—they will not affect existing columns (or data).
To update existing data, you should first read the Changing the Character Set section of the manual page on ALTER TABLE
:
Changing the Character Set
To change the table default character set and all character columns (
CHAR
,VARCHAR
,TEXT
) to a new character set, use a statement like this:ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
The statement also changes the collation of all character columns. If you specify no
COLLATE
clause to indicate which collation to use, the statement uses default collation for the character set. If this collation is inappropriate for the intended table use (for example, if it would change from a case-sensitive collation to a case-insensitive collation), specify a collation explicitly.For a column that has a data type of
VARCHAR
or one of theTEXT
types,CONVERT TO CHARACTER SET
changes the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, aTEXT
column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535. For alatin1
TEXT
column, each character requires a single byte, so the column can store up to 65,535 characters. If the column is converted toutf8
, each character might require up to three bytes, for a maximum possible length of 3 × 65,535 = 196,605 bytes. That length does not fit in aTEXT
column's length bytes, so MySQL converts the data type toMEDIUMTEXT
, which is the smallest string type for which the length bytes can record a value of 196,605. Similarly, aVARCHAR
column might be converted toMEDIUMTEXT
.To avoid data type changes of the type just described, do not use
CONVERT TO CHARACTER SET
. Instead, useMODIFY
to change individual columns. For example:ALTER TABLE t MODIFY latin1_text_col TEXT CHARACTER SET utf8;
ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8;
If you specify
CONVERT TO CHARACTER SET binary
, theCHAR
,VARCHAR
, andTEXT
columns are converted to their corresponding binary string types (BINARY
,VARBINARY
,BLOB
). This means that the columns no longer will have a character set attribute and a subsequentCONVERT TO
operation will not apply to them.If
charset_name
isDEFAULT
in aCONVERT TO CHARACTER SET
operation, the character set named by thecharacter_set_database
system variable is used.Warning
The
CONVERT TO
operation converts column values between the original and named character sets. This is not what you want if you have a column in one character set (likelatin1
) but the stored values actually use some other, incompatible character set (likeutf8
). In this case, you have to do the following for each such column:ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
The reason this works is that there is no conversion when you convert to or from
BLOB
columns.
To change only the default character set for a table, use this statement:
ALTER TABLE tbl_name DEFAULT CHARACTER SET charset_name;
The word
DEFAULT
is optional. The default character set is the character set that is used if you do not specify the character set for columns that you add to a table later (for example, withALTER TABLE ... ADD column
).When the
foreign_key_checks
system variable is enabled, which is the default setting, character set conversion is not permitted on tables that include a character string column used in a foreign key constraint. The workaround is to disableforeign_key_checks
before performing the character set conversion. You must perform the conversion on both tables involved in the foreign key constraint before re-enablingforeign_key_checks
. If you re-enableforeign_key_checks
after converting only one of the tables, anON DELETE CASCADE
orON UPDATE CASCADE
operation could corrupt data in the referencing table due to implicit conversion that occurs during these operations (Bug #45290, Bug #74816).
Reading č, ć and đ from MySQL using .NET Core service
So, the problem was that the stored procedure retrieving data referenced a column in another table which had swedish_latin_ci collation.
utf8 charset and utf8_general_ci collation is enough.
Related Topics
Given an Email as Raw Text, How to Send It Using PHP
Convert Hex to Ascii Characters
Simulate PHP Array Language Construct or Parse with Regexp
How to Get the Shortest Rather Than Longest Possible Regex Match with Preg_Match()
How to Prevent Self Closing Tag in PHP Simplexml
What Do \R and \N Mean in PHP (And Other Languages)
Symfony: How to Refresh the Authenticated User from the Database
Get/Set Dpi with PHP Gd/Imagick
How to Connect to the Database in Openshift Application
In PHP, How to Generate a Big Pseudo-Random Number
Get Order Increment Id in Magento
Check How Many Times Specific Value in Array PHP
PHP Array, Are Array Indexes Case Sensitive
Pdoexception' with Message 'Sqlstate[22001]: String Data, Right Truncated: 0
Finding Common Value Among Sub-Arrays in a Multidimensional Array