PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string
iconv("utf-8","ascii//TRANSLIT",$input);
Extended example
How to remove accents and turn letters into plain ASCII characters?
If you have iconv installed, try this (the example assumes your input string is in UTF-8):
echo iconv('UTF-8', 'ASCII//TRANSLIT', $string);
(iconv is a library to convert between all kinds of encodings; it's efficient and included with many PHP distributions by default. Most of all, it's definitely easier and more error-proof than trying to roll your own solution (did you know that there's a "Latin letter N with a curl"? Me neither.))
Convert special character (i.e. Umlaut) to most likely representation in ascii
I find iconv completely unreliable, and I dislike preg_match solutions and big arrays ... so my favorite way is ...
function toASCII( $str )
{
return strtr(utf8_decode($str),
utf8_decode('ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ'),
'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');
}
Convert 2 similarly-looking German characters of different kinds to same ASCII string in PHP
You could first convert your input to utf-8 using iconv
and then apply your conversion to ASCII. To detect the current encoding you can use mb_detect_encoding
.
$aUTF8 = iconv(mb_detect_encoding($a, 'UTF-8, ISO-8859-1', true), 'UTF-8', $a);
$bUTF8 = iconv(mb_detect_encoding($b, 'UTF-8, ISO-8859-1', true), 'UTF-8', $b);
$aASCII = iconv("utf-8", "ascii//TRANSLIT", $aUTF8);
$bASCII = iconv("utf-8", "ascii//TRANSLIT", $bUTF8);
Please note that you might have to add additional encodings to the encoding list of mb_detect_encoding
.
Replace worldwide diacritics characters
You can achieve this by using iconv
, available in PHP, and requesting an encoding conversion with transliteration. (This actually works for many different scripts!) If you only want basic European characters, make the target Latin-1, or even ASCII.
From the manual page:
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text)
What changes my UTF-8 string to ASCII?
Strings have no actual associated encoding, they're merely byte arrays. mb_detect_encoding
doesn't tell you what encoding the string has, it merely tries to detect it. That means it takes a few guesses (your second argument) and tells you the first that is valid.
Your original string probably contains some non-ASCII characters, so ASCII isn't a valid encoding for it, but UTF-8 is. When you're later testing a substring of the original, that substring probably contains only characters which are valid in ASCII, and since ASCII is the first encoding that's tested, that's the guessed result. Any ASCII string is also valid UTF-8, so there's no actual problem or "conversion" which happened.
Find specific UTF8 chars independent of php code charset?
- You should be in control of what your source code is encoded as, it'd be very weird to suddenly have its encoding change out from under you.
- If that is actually a legitimate concern you want to counteract, then you can't even rely on your source code being either Latin-1 or UTF-8, it could be any number of other encodings (though admittedly in practice Latin-1 is a pretty common guess). So
utf8_encode
is not guaranteed to fix your problem at all. To be 100% agnostic of your source code file's encoding, denote your characters as raw bytes:
$search = "\xC3\xA4,\xC3\xB6,\xC3\xBC"; // ä, ö and ü in UTF-8
Note that this still won't guarantee what encoding
$string
will be in, you'll need to know and/or control its encoding separately from this issue at hand. At some point you just have to nail down your used encodings, you can't be agnostic of it all the way through.
strtr() partially not work
As others have noted, the most likely cause is a character encoding mismatch. Since the titles you're trying to convert are apparently in UTF-8, the problem is most likely that your PHP source code isn't. Try re-saving the file as UTF-8 text, and see if that fixes the problem.
BTW, a simple way to debug this would be to print out both your data rows and your transliteration array into the same output file using e.g. print_r()
or var_dump()
, and look at the output to see if the non-ASCII characters in it look correct. If the characters look right in the data but wrong in the transliteration table (or vice versa), that's a sign that the encodings don't match.
Ps. If you have the PHP iconv extension installed (and you probably do), consider using it to automatically convert your titles to ASCII.
downgrade non-ascii symbols to closest 7-bit ASCII equivalent (preferrably Java)
Have a look at java.text.Normalizer
. It can help you with transforming equivalent characters: http://en.wikipedia.org/wiki/Unicode_equivalence
Related Topics
Are PHP Variables Passed by Value or by Reference
How to Get JavaScript Variable Value in PHP
Laravel 5 Pdoexception Could Not Find Driver
How to Check Whether Mod_Rewrite Is Enable on Server
PHP/MySQL Insert Row Then Get 'Id'
Download File Through an Ajax Call PHP
Laravel 5.2 Validation Error Not Appearing in Blade
PHP Echo VS PHP Short Echo Tags
How to Increase the Execution Timeout in PHP
Running MySQL *.SQL Files in PHP
How Do PHP Sessions Work? (Not "How Are They Used")
Convert Dot Syntax Like "This.That.Other" to Multi-Dimensional Array in PHP
How to Extend Access Token Validity Since Offline_Access Deprecation