Transliteration with Iconv in Ruby
It seems the solution is too tricky for me. Problem solved using stringex gem.
Transliteration in ruby
Ruby has an Iconv library in its stdlib which converts encodings in a very similar way to the usual iconv
command
Are Iconv.convert return values in wrong order?
Running it through Unicode decomposition (as people kind of mentioned in the forum thread you linked to) seems to do it on my OS X:
iex> :iconv.convert "utf-8", "ascii//translit", String.normalize("árboles más grandes", :nfd)
"arboles mas grandes"
Decomposition means it will be normalized so that e.g. "á" is represented as two Unicode codepoints ("a" and a combining accent) as opposed to a composed form where it's a single Unicode codepoint. So I guess iconv's ASCII transliteration removes standalone accents/diacritics, but converts composed characters to things like 'a
.
Iconv and Kconv on Ruby (1.9.2)
As https://stackoverflow.com/users/23649/jtbandes says, it looks Kconv
is like Iconv
but specialized for Kanji ("the logographic Chinese characters that are used in the modern Japanese writing system along with hiragana" http://en.wikipedia.org/wiki/Kanji). Unless you are working on something specifically Japanese, I'm guessing you don't need Kconv
.
If you're using Ruby 1.9, you can use the built-in encoding support most of the time instead of Iconv
. I tried for hours to understand what I was doing until I read this:
http://www.joelonsoftware.com/articles/Unicode.html
Then you can start to use stuff like
String#encode # Ruby 1.9
String#encode! # Ruby 1.9
String#force_encoding # Ruby 1.9
with confidence. If you have more complex needs, do read http://blog.grayproductions.net/categories/character_encodings
UPDATED Thanks to JohnZ in the comments
Iconv
is still useful in Ruby 1.9 because it can transliterate characters (something that String#encode
et al. can't do). Here's an example of how to extend String
with a function that transliterates to UTF-8:
require 'iconv'
class ::String
# Return a new String that has been transliterated into UTF-8
# Should work in Ruby 1.8 and Ruby 1.9 thanks to http://po-ru.com/diary/fixing-invalid-utf-8-in-ruby-revisited/
def as_utf8(from_encoding = 'UTF-8')
::Iconv.conv('UTF-8//TRANSLIT', from_encoding, self + ' ')[0..-2]
end
end
"foo".as_utf8 #=> "foo"
"foo".as_utf8('ISO-8859-1') #=> "foo"
Thanks JohnZ!
iconv utf-8 to ascii transliteration in mod_php/apache2
figured out that the locale wasnt set up correctly and my attempts to set it failed as they locales available on the system were actually named different then the manpage examples (according to their encoding!)
a simple locale -a
revealed that ;O)
setlocale(LC_ALL, "en_US.utf8");
this actually did the job!
well now this function works perfectly.
well now ita clear why it worked from the console as well, because the locale was imported from the current users shell settings ;)
it actually just needs any locale set up. doesnt really matter which one as we convert to ascii where everybody is equal, only some are more equal than others :)
Be careful to set a locale that is actually installed in your system and check the result of setlocale, because you won't change anything if the locale is not installed or name is misspelled.
Related Topics
State MAChine, Model Validations and Rspec
Mongodb Group Using Ruby Driver
How to Specify a Struct as the Return Value of a Function in Rubyffi
Google Analytics API Error "Selected Dimensions and Metrics Cannot Be Queried Together."
Why Does Ruby Hash a Fixnum N to 2N+1
Issues with Installing Ruby 2.0.0 on MACos Catalina
Why Does Ruby Release Memory Only Sometimes
Match Regex with Numeric Value and Decimal
Floating Point Precision in Ruby on Rails Model Validations
In Ruby How to Tell If a String Input Is in Uppercase or Lowercase
Remove a Tag But Keep the Text
Find Both Pattern and Position of Multiple Regex Matches in Ruby
Can't All or Most Cases of 'Each' Be Replaced with 'Map'
Arbitrary Precision Arithmetic with Ruby
Error: "Fatal: I Don't Handle Protocol ''Git' When Using Bundle Install