Transliterate cyrillic symbols in string into latin in Ruby?
You can use the translit
gem:
require 'translit'
str = "Кириллица"
Translit.convert(str, :english)
#=> "Kirillica"
Transliteration in ruby
Ruby has an Iconv library in its stdlib which converts encodings in a very similar way to the usual iconv
command
Transliteration with Iconv in Ruby
It seems the solution is too tricky for me. Problem solved using stringex gem.
Ruby transliteration using hash
The problem with .each_char
is that the block variable - c
in your question - does not point back to the character in the string allowing to alter the string in situ. There are ways you could make that per-character mapping work from there (using a .map
followed by a .join
for instance) - but they are inefficient compared to .tr!
or .gsub!
for your purpose, because breaking the string out into an array of characters and reconstructing it involves creating many Ruby objects.
I think you need to do something like
file_name.tr!( 'aбвгдилмнпрстуфхцыю', 'abvgdilmnprstufhcyu' )
which covers the single letter conversions very efficiently. You then have some multi-letter conversions. I would use gsub!
for that, and an inverted copy of your hash
latin_of = {"ё"=>"jo", "ж"=>"zh", "з"=>"th", "ч"=>"ch",
"ш"=>"sh", "щ"=>"sch", "я"=>"ja"}
file_name.gsub!( /[ёжзчшщя]/ ) { |cyrillic| latin_of[ cyrillic ] }
Note, unlike each_char
, the return value of the block in .gsub!
is used to replace whatever you matched in the original string. The above code uses an inversion of your original hash to quickly find the correct Latin replacement for the matched Cyrillic character.
You don't need tr!
. . . instead, if you prefer, just use an inversion of your original hash in one pass using this second syntax. The cost of using two methods probably means you don't really gain that much from using .tr!
. But you should know about String#tr!
method, it can be very handy.
Edit: As suggested in comments, .gsub!
can do a lot more for you here. Assuming latin_of
was the complete hash with Cyrillic keys and the Latin values, you could do this:
file_name.gsub!( Regexp.union(latin_of.keys), latin_of )
Two things to note:
Regexp.union(latin_of.keys)
is taking an array of the keys you want to convert and ensuringgsub
will find them ready for replacement in theString
gsub!
accepts a hash as the second parameter, and converts each match by looking it up as a key and replacing it with the associated value - exactly the behaviour you are looking for.
Transliterating chars in Ruby
You can use casecmp
to do a case insensitive compare:
if word.casecmp("for") == 0
word = "4"
end
Rails I18n transliteration rules configuration
Looks like you have got a cyrillic 'а' (char code 1072) mapped to a cyrillic 'а' in your en.yml. Do you just need to map cyrillic 'а' to roman 'a'?
en:
i18n:
transliterate:
rule:
а: "a"
б: "b"
(both seem to look the same on this site I think).
Related Topics
Serving Static Files With Sinatra
Pass a Variable into a Partial, Rails 3
How to Access Method Arguments in Ruby
How to Avoid Nomethoderror For Nil Elements When Accessing Nested Hashes
What Does the "||=" Operand Stand for in Ruby
Using Rails Migration on Different Database Than Standard "Production" or "Development"
How to Clear the Terminal in Ruby
Ruby on Rails - Access Controller Variable from Model
Where Is the Rails Method That Converts Data from 'Datetime_Select' into a Datetime Object
Getting Fields_For and Accepts_Nested_Attributes_For to Work With a Belongs_To Relationship
How to Remove a Key from Hash and Get the Remaining Hash in Ruby/Rails
How to Find a Hash Key Containing a Matching Value