Cyrillic transliteration in PHP
Try following code
$textcyr="Тествам с кирилица";
$textlat="I pone dotuk raboti!";
$cyr = ['Љ', 'Њ', 'Џ', 'џ', 'ш', 'ђ', 'ч', 'ћ', 'ж', 'љ', 'њ', 'Ш', 'Ђ', 'Ч', 'Ћ', 'Ж','Ц','ц', 'а','б','в','г','д','е','ё','ж','з','и','й','к','л','м','н','о','п', 'р','с','т','у','ф','х','ц','ч','ш','щ','ъ','ы','ь','э','ю','я', 'А','Б','В','Г','Д','Е','Ё','Ж','З','И','Й','К','Л','М','Н','О','П', 'Р','С','Т','У','Ф','Х','Ц','Ч','Ш','Щ','Ъ','Ы','Ь','Э','Ю','Я'
];
$lat = ['Lj', 'Nj', 'Dž', 'dž', 'š', 'đ', 'č', 'ć', 'ž', 'lj', 'nj', 'Š', 'Đ', 'Č', 'Ć', 'Ž','C','c', 'a','b','v','g','d','e','io','zh','z','i','y','k','l','m','n','o','p', 'r','s','t','u','f','h','ts','ch','sh','sht','a','i','y','e','yu','ya', 'A','B','V','G','D','E','Io','Zh','Z','I','Y','K','L','M','N','O','P', 'R','S','T','U','F','H','Ts','Ch','Sh','Sht','A','I','Y','e','Yu','Ya'
];
$textcyr = str_replace($cyr, $lat, $textcyr);
$textlat = str_replace($lat, $cyr, $textlat);
echo("$textcyr $textlat");
PHP Transliteration
You can use iconv, which has a special transliteration encoding.
When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character.
-- http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html
See here for a complete example that matches your use case.
PHP transliterate specify locale
Yes, Han-Latin
means pinyin. ICU transliterators come from CLDR (I'll update the userguide to make this clear). ICU already can convert kana (hira/kata) to latin, but Kanji has more than one reading, so you won't find what you are looking for with a simple table-based conversion.
edit: so to summarize, ICU will not do what you want without writing rules, nor does it seem to me likely to be simple to do with your own rules due to how the Japanese language works.
PHP convert cyrillic
You can take this http://drupal.org/project/transliteration and make it suit your project. This is one of the best implementations of transliteration.
Also you can transliterate using iconv:
echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text), PHP_EOL;
Exclude specific characters from Transliterator conversion
Given your example with input and output:
$transliterator = Transliterator::create("Any-Latin; Latin-ASCII");
$str = "AŠAàèìòù Chén Hǎi yáo München Faißt Финиш 国内 - 镜像";
echo $transliterator->transliterate($str), "\n";
ASAaeiou Chen Hai yao Munchen Faisst Finis guo nei - jing xiang
when applying the transliteration only on the segments that do not match the range of characters you specified to keep (the italian accented characters [àèìòù]) it should provide the result.
One option is to use preg_replace_callback
for that.
It requires to have a callback to apply the transliteration:
$transliterate = static function (array $match) use ($transliterator) {
return $transliterator->transliterate($match[0]);
};
And it requires to have a pattern to match everything but the characters to keep. It needs to be properly defined and compatible with Unicode:
([^\xE0\xE8\xEC\xF2\xF9]+)ui
(...) : delimiters: the regular expression is inside
u : modifier: u - Unicode mode (UTF-8 encoding in
PHP, PCRE_UTF8)
i : modifier: i - letters in the pattern match
both upper and lower case letters
(PCRE_CASELESS)
[^...] : character class: not matching any of the
characters (`^`); negated character class
\xE0\xE8\xEC\xF2\xF9 : the italian accented characters àèìòù written
in a stable notation (you can easily copy and
paste it for example)
Last but not least, the subject to operate on must be compatible with the characters to keep. As there can be many ways to write the same character in Unicode, the input is normalized to be compatible with the PCRE pattern:
echo preg_replace_callback(
'([^\xE0\xE8\xEC\xF2\xF9]+)ui',
$transliterate,
Normalizer::normalize($str, Normalizer::NFC)
), "\n";
The output:
ASAàèìòù Chen Hai yao Munchen Faisst Finis guo nei - jing xiang
Example across PHP versions.
Addendum:
\xE0\xE1\xE8\xE9\xEC\xED\xF2\xF3\xF9\xFA
lower-case list of italian accented characters (can be used with i-modifier)\xC0\xC1\xC8\xC9\xCC\xCD\xD2\xD3\xD9\xDA\xE0\xE1\xE8\xE9\xEC\xED\xF2\xF3\xF9\xFA
lower- and upper-case list of italian accented characters (can be used without i-modifier)- PCRE Syntax CHARACTERS (excerpt):
\xhh character with hex code hh
\x{hhh..} character with hex code hhh.. - Link to the full PCRE syntax: https://www.pcre.org/original/doc/html/pcresyntax.html
Transliterate any convertible utf8 char into ascii equivalent
The toAscii() function of Patchwork\Utf8 does exactly this, see:
https://github.com/nicolas-grekas/Patchwork-UTF8/blob/master/src/Patchwork/Utf8.php
It leverages iconv and intl's Normalizer to remove accents, split ligatures and do many other generic transliterations.
Intelligent transliteration in PHP
I know with Japanese at least, you have a set number of letter combinations.
So, you could do something like create a matching array like this
array(
'oo' => 'おう',
'oh' => 'おう',
'ou' => 'おう'
)
Of course, continuing on, and making sure you don't match 'su', when it should be 'tsu'.
This would only be a starting point, of course.
Machine learning is probably most practical with Chinese...but here's a rough start to hiragana: https://gist.github.com/1154969
Where can I find a list of IDs or rules for the PHP transliterator (Intl)?
The ids that Transliterator::listIDs()
are the "basic ids". The example you gave is a "compound id". You can see the ICU docs on this.
You can also create your own rules with Transliterator::createFromRules()
.
You can take a look at the prefefined rules:
<?php
$a = new ResourceBundle(NULL, sprintf('icudt%dl-translit', INTL_ICU_VERSION), true);
foreach ($a['RuleBasedTransliteratorIDs'] as $name => $v) {
$file = @$v['file'];
if (!$file) {
$file = $v['internal'];
echo $name, " (direction $file[direction]; internal)\n";
} else {
echo $name, " (direction: $file[direction])\n";
echo $file['resource'];
}
echo "\n--------------\n";
}
After formatting, the result looks like this.
Related Topics
How to Require a Fork With Composer
PHP: How to Remove Specific Element from an Array
What Type of Hash Does Wordpress Use
Passing JavaScript Array to PHP Through Jquery $.Ajax
"[Notice] Child Pid Xxxx Exit Signal Segmentation Fault (11)" in Apache Error.Log
Best Way to Defend Against MySQL Injection and Cross Site Scripting
Downloading Large Files Reliably in PHP
What Are Register_Globals in PHP
Does File_Get_Contents() Have a Timeout Setting
How to Prevent My Site Page to Be Loaded Via 3Rd Party Site Frame of Iframe
Listing All the Folders Subfolders and Files in a Directory Using PHP
How to Pass Extra Variables in Url With Wordpress
PHP Mail() Function on Localhost
Find the Last Element of an Array While Using a Foreach Loop in PHP
Create or Write/Append in Text File
Implode an Array With ", " and Add "And " Before the Last Item