Transliteration in Ruby

Transliterate cyrillic symbols in string into latin in Ruby?

You can use the translit gem:

require 'translit'

str = "Кириллица"
Translit.convert(str, :english)
#=> "Kirillica"

Transliteration in ruby

Ruby has an Iconv library in its stdlib which converts encodings in a very similar way to the usual iconv command

Transliteration with Iconv in Ruby

It seems the solution is too tricky for me. Problem solved using stringex gem.

Ruby transliteration using hash

The problem with .each_char is that the block variable - c in your question - does not point back to the character in the string allowing to alter the string in situ. There are ways you could make that per-character mapping work from there (using a .map followed by a .join for instance) - but they are inefficient compared to .tr! or .gsub! for your purpose, because breaking the string out into an array of characters and reconstructing it involves creating many Ruby objects.

I think you need to do something like

file_name.tr!( 'aбвгдилмнпрстуфхцыю', 'abvgdilmnprstufhcyu' )

which covers the single letter conversions very efficiently. You then have some multi-letter conversions. I would use gsub! for that, and an inverted copy of your hash

latin_of = {"ё"=>"jo", "ж"=>"zh", "з"=>"th", "ч"=>"ch", 
"ш"=>"sh", "щ"=>"sch", "я"=>"ja"}
file_name.gsub!( /[ёжзчшщя]/ ) { |cyrillic| latin_of[ cyrillic ] }

Note, unlike each_char, the return value of the block in .gsub! is used to replace whatever you matched in the original string. The above code uses an inversion of your original hash to quickly find the correct Latin replacement for the matched Cyrillic character.

You don't need tr! . . . instead, if you prefer, just use an inversion of your original hash in one pass using this second syntax. The cost of using two methods probably means you don't really gain that much from using .tr!. But you should know about String#tr! method, it can be very handy.


Edit: As suggested in comments, .gsub! can do a lot more for you here. Assuming latin_of was the complete hash with Cyrillic keys and the Latin values, you could do this:

file_name.gsub!( Regexp.union(latin_of.keys), latin_of )

Two things to note:

  • Regexp.union(latin_of.keys) is taking an array of the keys you want to convert and ensuring gsub will find them ready for replacement in the String

  • gsub! accepts a hash as the second parameter, and converts each match by looking it up as a key and replacing it with the associated value - exactly the behaviour you are looking for.

Transliterating chars in Ruby

You can use casecmp to do a case insensitive compare:

if word.casecmp("for") == 0
word = "4"
end

Rails I18n transliteration rules configuration

Looks like you have got a cyrillic 'а' (char code 1072) mapped to a cyrillic 'а' in your en.yml. Do you just need to map cyrillic 'а' to roman 'a'?

en:
i18n:
transliterate:
rule:
а: "a"
б: "b"

(both seem to look the same on this site I think).



Related Topics



Leave a reply



Submit