Unable to retrieve UTF-8 accented characters from Access via PDO_ODBC
The Problem
When using native PHP ODBC features (PDO_ODBC or the older odbc_
functions) and the Access ODBC driver, text is not UTF-8 encoded, even though it is stored in the Access database as Unicode characters. So, for a sample table named "Teams"
Team
-----------------------
Boston Bruins
Canadiens de Montréal
Федерация хоккея России
the code
<?php
header('Content-Type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Access character test</title>
</head>
<body>
<?php
$connStr =
'odbc:' .
'Driver={Microsoft Access Driver (*.mdb)};' .
'Dbq=C:\\Users\\Public\\__SO\\28311687.mdb;' .
'Uid=Admin;';
$db = new PDO($connStr);
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$sql = "SELECT Team FROM Teams";
foreach ($db->query($sql) as $row) {
$s = $row["Team"];
echo $s . "<br/>\n";
}
?>
</body>
</html>
displays this in the browser
Boston Bruins
Canadiens de Montr�al
????????? ?????? ??????
The Easy but Incomplete Fixes
The text returned by Access ODBC actually matches the Windows-1252 character encoding for the characters in that character set, so simply changing the line
$s = $row["Team"];
to
$s = utf8_encode($row["Team"]);
will allow the second entry to be displayed correctly
Boston Bruins
Canadiens de Montréal
????????? ?????? ??????
but the utf8_encode() function converts from ISO-8859-1, not Windows-1252, so some characters (notably the Euro symbol '€') will disappear. A better solution would be to use
$s = mb_convert_encoding($row["Team"], "UTF-8", "Windows-1252");
but that still wouldn't solve the problem with the third entry in our sample table.
The Complete Fix
For full UTF-8 support we need to use COM with ADODB Connection and Recordset objects like so
<?php
header('Content-Type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Access character test</title>
</head>
<body>
<?php
$connStr =
'Driver={Microsoft Access Driver (*.mdb)};' .
'Dbq=C:\\Users\\Public\\__SO\\28311687.mdb';
$con = new COM("ADODB.Connection", NULL, CP_UTF8); // specify UTF-8 code page
$con->Open($connStr);
$rst = new COM("ADODB.Recordset");
$sql = "SELECT Team FROM Teams";
$rst->Open($sql, $con, 3, 3); // adOpenStatic, adLockOptimistic
while (!$rst->EOF) {
$s = $rst->Fields("Team");
echo $s . "<br/>\n";
$rst->MoveNext;
}
$rst->Close();
$con->Close();
?>
</body>
</html>
How to use PHP PDO ODBC with SQL Server and Unicode Characters?
No, I don't think you forgot to configure anything. In fact, your explanation is the best I've found so far regarding the long-standing "issues" between PHP and Microsoft ODBC drivers. Those issues are especially puzzling given that the PDO_ODBC page says:
On Windows, PDO_ODBC ... is the recommended driver for connecting to Microsoft SQL Server databases.
However, on Windows they also offer PDO_SQLSRV which actually does appear to work correctly.
So it seems that PDO_ODBC "doesn't have any notion of NVARCHAR", rather than PDO as a whole.
(Similar problems arise when trying to use PHP with Microsoft Access ODBC if Unicode characters are involved)
Conclusion: PHP support for ODBC continues to be a bit of a mess, at least where Microsoft databases are concerned.
How to fix encoding when connecting via php-odbc?
PHP-ODBC is not Unicode driver. So PHP is treating the connection as ASCII-only. You not able to send Unicode characters to a database via PHP-ODBC.
The only workaround is to turn on the option "Use Unicode UTF-8 for worldwide language support" in Windows 10 Region Settings (Note: this experimental feature is not available prior Windows 10). In this case Windows 10 uses Wide-String API for non-Unicode programs.
Problems with special chars encoding with an access mdb database using php
I finally found the solution.
I had the solution from the beginning but I was doing my tests wrong.
My bad.
The right way to do it for me is mb_convert_encoding($string, "UTF-8", "Windows-1252")
But i was checking the result like this:
$stringUTF8 = mb_convert_encoding($string, "UTF-8", "Windows-1252");
echo json_encode($stringUTF8);
that's why it was returning unicode chars like \u20ac
, if I would have done:
$stringUTF8 = mb_convert_encoding($string, "UTF-8", "Windows-1252");
echo $stringUTF8;
I should have seen the solution from the beginning but I was wrong. It was json_encode()
what was turning special chars into unicode chars.
Thanks everybody for your help!!
Related Topics
Execute Commands on Remote MAChine via PHP
Least Memory Intensive Way to Read a File in PHP
Angularjs Upload Multiple Files with Formdata API
Any Decent PHP Parser Written in PHP
How to Run PHP Files on My Computer
PHP Move_Uploaded_File() Error
Warning: MySQLi_Query() Expects Parameter 1 to Be MySQLi, Resource Given
Laravel - Pass More Than One Variable to View
Access PHP Variable in JavaScript
Two Simultaneous Ajax Requests Won't Run in Parallel
How to Add All of My Array Values Together in PHP
PHP JSON Encode - Malformed Utf-8 Characters, Possibly Incorrectly Encoded