How to Retrieve Utf-8 Accented Characters from Access via Pdo_Odbc

Unable to retrieve UTF-8 accented characters from Access via PDO_ODBC

The Problem

When using native PHP ODBC features (PDO_ODBC or the older odbc_ functions) and the Access ODBC driver, text is not UTF-8 encoded, even though it is stored in the Access database as Unicode characters. So, for a sample table named "Teams"

Team
-----------------------
Boston Bruins
Canadiens de Montréal
Федерация хоккея России

the code

<?php
header('Content-Type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Access character test</title>
</head>
<body>
<?php
$connStr =
'odbc:' .
'Driver={Microsoft Access Driver (*.mdb)};' .
'Dbq=C:\\Users\\Public\\__SO\\28311687.mdb;' .
'Uid=Admin;';
$db = new PDO($connStr);
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$sql = "SELECT Team FROM Teams";
foreach ($db->query($sql) as $row) {
$s = $row["Team"];
echo $s . "<br/>\n";
}
?>
</body>
</html>

displays this in the browser

Boston Bruins
Canadiens de Montr�al
????????? ?????? ??????

The Easy but Incomplete Fixes

The text returned by Access ODBC actually matches the Windows-1252 character encoding for the characters in that character set, so simply changing the line

$s = $row["Team"];

to

$s = utf8_encode($row["Team"]);

will allow the second entry to be displayed correctly

Boston Bruins
Canadiens de Montréal
????????? ?????? ??????

but the utf8_encode() function converts from ISO-8859-1, not Windows-1252, so some characters (notably the Euro symbol '€') will disappear. A better solution would be to use

$s = mb_convert_encoding($row["Team"], "UTF-8", "Windows-1252");

but that still wouldn't solve the problem with the third entry in our sample table.

The Complete Fix

For full UTF-8 support we need to use COM with ADODB Connection and Recordset objects like so

<?php
header('Content-Type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Access character test</title>
</head>
<body>
<?php
$connStr =
'Driver={Microsoft Access Driver (*.mdb)};' .
'Dbq=C:\\Users\\Public\\__SO\\28311687.mdb';
$con = new COM("ADODB.Connection", NULL, CP_UTF8); // specify UTF-8 code page
$con->Open($connStr);

$rst = new COM("ADODB.Recordset");
$sql = "SELECT Team FROM Teams";
$rst->Open($sql, $con, 3, 3); // adOpenStatic, adLockOptimistic
while (!$rst->EOF) {
$s = $rst->Fields("Team");
echo $s . "<br/>\n";
$rst->MoveNext;
}
$rst->Close();
$con->Close();
?>
</body>
</html>

How to use PHP PDO ODBC with SQL Server and Unicode Characters?

No, I don't think you forgot to configure anything. In fact, your explanation is the best I've found so far regarding the long-standing "issues" between PHP and Microsoft ODBC drivers. Those issues are especially puzzling given that the PDO_ODBC page says:

On Windows, PDO_ODBC ... is the recommended driver for connecting to Microsoft SQL Server databases.

However, on Windows they also offer PDO_SQLSRV which actually does appear to work correctly.

So it seems that PDO_ODBC "doesn't have any notion of NVARCHAR", rather than PDO as a whole.

(Similar problems arise when trying to use PHP with Microsoft Access ODBC if Unicode characters are involved)

Conclusion: PHP support for ODBC continues to be a bit of a mess, at least where Microsoft databases are concerned.

How to fix encoding when connecting via php-odbc?

PHP-ODBC is not Unicode driver. So PHP is treating the connection as ASCII-only. You not able to send Unicode characters to a database via PHP-ODBC.

The only workaround is to turn on the option "Use Unicode UTF-8 for worldwide language support" in Windows 10 Region Settings (Note: this experimental feature is not available prior Windows 10). In this case Windows 10 uses Wide-String API for non-Unicode programs.

Problems with special chars encoding with an access mdb database using php

I finally found the solution.
I had the solution from the beginning but I was doing my tests wrong.

My bad.

The right way to do it for me is mb_convert_encoding($string, "UTF-8", "Windows-1252")

But i was checking the result like this:

$stringUTF8 = mb_convert_encoding($string, "UTF-8", "Windows-1252");
echo json_encode($stringUTF8);

that's why it was returning unicode chars like \u20ac, if I would have done:

$stringUTF8 = mb_convert_encoding($string, "UTF-8", "Windows-1252");
echo $stringUTF8;

I should have seen the solution from the beginning but I was wrong. It was json_encode() what was turning special chars into unicode chars.

Thanks everybody for your help!!



Related Topics



Leave a reply



Submit