Cyrillic Transliteration in PHP

Cyrillic transliteration in PHP

Try following code

$textcyr="Тествам с кирилица";
$textlat="I pone dotuk raboti!";
$cyr = ['Љ', 'Њ', 'Џ', 'џ', 'ш', 'ђ', 'ч', 'ћ', 'ж', 'љ', 'њ', 'Ш', 'Ђ', 'Ч', 'Ћ', 'Ж','Ц','ц', 'а','б','в','г','д','е','ё','ж','з','и','й','к','л','м','н','о','п', 'р','с','т','у','ф','х','ц','ч','ш','щ','ъ','ы','ь','э','ю','я', 'А','Б','В','Г','Д','Е','Ё','Ж','З','И','Й','К','Л','М','Н','О','П', 'Р','С','Т','У','Ф','Х','Ц','Ч','Ш','Щ','Ъ','Ы','Ь','Э','Ю','Я'
];
$lat = ['Lj', 'Nj', 'Dž', 'dž', 'š', 'đ', 'č', 'ć', 'ž', 'lj', 'nj', 'Š', 'Đ', 'Č', 'Ć', 'Ž','C','c', 'a','b','v','g','d','e','io','zh','z','i','y','k','l','m','n','o','p', 'r','s','t','u','f','h','ts','ch','sh','sht','a','i','y','e','yu','ya', 'A','B','V','G','D','E','Io','Zh','Z','I','Y','K','L','M','N','O','P', 'R','S','T','U','F','H','Ts','Ch','Sh','Sht','A','I','Y','e','Yu','Ya'
];
$textcyr = str_replace($cyr, $lat, $textcyr);
$textlat = str_replace($lat, $cyr, $textlat);
echo("$textcyr $textlat");

PHP Cyrillic transliteration: full html content (without damaging html tags)?

/**
* Converts SR-Latin to SR-Cyr
*
* @param String $word
* @return String $word
*/
public static function filterWord($word)
{
$lat = [
'a','b','v','g','d','đ','e','ž','z','i','j','k','l','lj','m','n','nj','o','p','r','s','t','ć','u','f','h','c','č','dž','š','š',
'A','B','V','G','D','Đ','E','Ž','Z','I','J','K','L','Lj','M','N','Nj','O','P','R','S','T','Ć','U','F','H','C','Č','Dž','Š','Š'
];

$cyr = [
'а','б','в','г','д','ђ','е','ж','з','и','ј','к','л','љ','м','н','њ','о','п','р','с','т','ћ','у','ф','х','ц','ч','џ','ш','ш',
'А','Б','В','Г','Д','Ђ','Е','Ж','З','И','Ј','К','Л','Љ','М','Н','Њ','О','П','Р','С','Т','Ћ','У','Ф','Х','Ц','Ч','Џ','Ш','Ш'
];

$spec1 = [
'Нј','нј','Лј','лј','ДЖ','Дж','дж'
];
$spec2 = [
'Њ','њ','Љ','љ','Џ','Џ','џ'
];

$tags_cyr = [
'<х1','</х1>','<х2','</х2>','<х3','</х3>','<х4','</х4>','<х5','</х5>','<х6','</х6>','<п','</п>','<спан','</спан>','<б','</б>','<стронг','</стронг>','<смалл','</смалл>','<и','</и>','<у','</у>','<имг','</имг>','<див','</див>','<а','</а>','<суб','</суб>','<суп','</суп>','<бр>','<бр/>','<bр','<хр/>','&нбсп;','&лт;','&гт;','&ндасх;','&мдасх;','хреф','срц','&лдqуо;','&бдqуо;','&лсqуо;','&рсqуо;','&сцарон;','&Сцарон;','&тилде;'
];

$tags_lat = [
'<h1','</h1>','<h2','</h2>','<h3','</h3>','<h4','</h4>','<h5','</h5>','<h6','</h6>','<p','</p>','<span','</span>','<b','</b>','<strong','</strong>','<small','</small>','<i','</i>','<u','</u>','<img','</img>','<div','</div>','<a','</a>','<sub','</sub>','<sup','</sup>','<br>','<br/>','<br','<hr/>',' ','<','>','–','—','href','src','“','„','‘','’','ш','Ш','˜'
];

$word = str_replace($tags_cyr, $tags_lat, str_replace($spec1, $spec2, str_replace($lat, $cyr, $word)));

$lastPos = 0;
$positions = [];

while (($lastPos = mb_strpos($word, '<', $lastPos, 'UTF-8')) !== false) {
$positions[] = $lastPos;
$lastPos = $lastPos + mb_strlen('<', 'UTF-8');
}

foreach ($positions as $position) {
if(mb_strpos($word, '>', 0, 'UTF-8') !== false) {
$end = mb_strpos($word, ">", $position, 'UTF-8') - $position;
$tag = mb_substr($word, $position, $end, 'UTF-8');
$tag_lat = str_replace($cyr, $lat, $tag);
$word = str_replace($tag, $tag_lat, $word);
}
}

return $word;
}

PHP convert cyrillic

You can take this http://drupal.org/project/transliteration and make it suit your project. This is one of the best implementations of transliteration.

Also you can transliterate using iconv:

echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text), PHP_EOL;

PHP Transliteration

You can use iconv, which has a special transliteration encoding.

When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character.

-- http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html

See here for a complete example that matches your use case.

Russian to English transliteration in symfony2

There is no native tools for Symfony, but there is "native" PHP tools in Intl library.

For example

$transliterator = \Transliterator::create('Any-Latin');
$transliteratorToASCII = \Transliterator::create('Latin-ASCII');
$transliterateTitle = $transliteratorToASCII->transliterate($transliterator->transliterate($title));

In first place we transliterate Russian to Latin, then we transliterate it to ASCII.

If you don't like this method, you just can take something like in this example http://htmlweb.ru/php/example/translit.php. Just create class with such method end register it as service.

Intelligent transliteration in PHP

I know with Japanese at least, you have a set number of letter combinations.

So, you could do something like create a matching array like this

array(
'oo' => 'おう',
'oh' => 'おう',
'ou' => 'おう'
)

Of course, continuing on, and making sure you don't match 'su', when it should be 'tsu'.

This would only be a starting point, of course.

Machine learning is probably most practical with Chinese...but here's a rough start to hiragana: https://gist.github.com/1154969

Why Inflector::slug in Yii2 generates wrong latin string from cyrillic?

Because it uses ISO 9 for handling cyrillic and aparently its ISO 9:1995 version.
Now when I put:

echo \yii\helpers\Inflector::transliterate('автоматизация', 'Cyrillic;  Any-Latin');
echo yii\helpers\Inflector::transliterate('зачислить', $a);

I get:

avtomatizaciâ 
začislitʹ

Which is as in ISO 9:1995. Slug method does conversion to ASCII character to charcter and so for example č gets changed into c.

You can still do as you want just str-replace where needed.
Or you can do the transliteration another way like this.

Convert cyrillic 1251 to UTF-8

What you have is a UTF8 string made up of cp1252 characters which are a misrepresentation of cp1251.

The true answer is to fix what produced this mistake so that your data doesn't get corrupted like this.

The worse answer is to repeat the mis-translation in reverse to recover the original string, and then convert it properly.

$input = 'Íó è ÿ ñäåëàëà âûâîäû...';

// convert back to source string via CP1252 single-byte encoding
$out = mb_convert_encoding($input, 'CP1252', 'UTF-8');

// correctly convert source string to UTF8 using CP1251
$out = mb_convert_encoding($out, 'UTF-8', 'CP1251');

var_dump($st2);


Related Topics



Leave a reply



Submit