How do I replace accented Latin characters in Ruby?
Rails has already a builtin for normalizing, you just have to use this to normalize your string to form KD and then remove the other chars (i.e. accent marks) like this:
>> "àáâãäå".mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n,'').downcase.to_s
=> "aaaaaa"
Replace accented character in Ruby
The é
in your name
are actually two different Unicode codepoints: U+0065 (LATIN SMALL LETTER E
) and U+0301 (COMBINING ACUTE ACCENT
).
p 'é'.each_codepoint.map{|e|"U+#{e.to_s(16).upcase.rjust(4,'0')}"} * ' ' # => "U+0065 U+0301"
However the é
in your regex is only one: U+00E9 (LATIN SMALL LETTER E WITH ACUTE
). Wikipedia has an article about Unicode equivalence. The official Unicode FAQ also contains explanations and information about this topic.
How to normalize Unicode strings in Ruby depends on its version. It has Unicode normalization support since 2.2. You don't have to require a library or install a gem like in previous versions (here's an overview). To normalize name
simpy call String#unicode_normalize
with :nfc
or :nfkc
as argument to compose é
(U+0065 and U+0301) to é
(U+00E9):
name = File.basename(Dir.getwd)
name.unicode_normalize! # thankfully :nfc is the default
name.downcase!
Of course, you could also use decomposed characters in your regular expressions but that probably won't work on other file systems and then you would also have to normalize: NFD or NFKD to decompose.
I also like to or even should point out that converting é
to e
or ü
to u
causes information loss. For example, the German word Müll (trash) would be converted to Mull (mull / forest humus).
Ruby method to remove accents from UTF-8 international characters
I generally use I18n to handle this:
1.9.3p392 :001 > require "i18n"
=> true
1.9.3p392 :002 > I18n.transliterate("Hé les mecs!")
=> "He les mecs!"
How to check if a string contains accented Latin characters like é in Ruby?
I would first strip out all plain ASCII characters with gsub
, and then check with a regex to see if any Latin characters remain. This should detect the accented latin characters.
def latin_accented?(str)
str.gsub(/\p{Ascii}/, "") =~ /\p{Latin}/
end
latin_accented?("é") #=> 0 (truthy)
latin_accented?("囧") #=> nil (falsy)
latin_accented?("ジ") #=> nil (falsy)
latin_accented?("e") #=> nil (falsy)
How to match latin and not latin characters by normalised version of string?
It may be prohibitively hard to normalize the thing you match against, so I recommend changing the regex.
I don't know if Ruby supports the [=o=]
(which matches o
and all its accented versions) POSIX bracket expression syntax, but there is also another way.
Replace every letter with an alternative accented form with a character class. For example:
/Bart[lł]omiej [ZŻ][oó][lł][cć]/g
Removing accents/diacritics from string while preserving other special chars (tried mb_chars.normalize and iconv)
it also removes spaces, dots, dashes, and who knows what else.
It shouldn't.
string.mb_chars.normalize(:kd).gsub(/[^x00-\x7F]/n, '').to_s
You've mistyped, there should be a backslash before the x00, to refer to the NUL character.
/[^\-x00-\x7F]/n # So it would leave the dash alone
You've put the ‘-’ between the ‘\’ and the ‘x’, which will break the reference to the null character, and thus break the range.
Related Topics
How to Write a Switch Statement in Ruby
How to Update Ruby Gems from Behind a Proxy (Isa-Ntlm)
How to "Pretty" Format Json Output in Ruby on Rails
No Increment Operator (++) in Ruby
Rails Keeps Telling Me That It's Not Currently Installed
How to Find the Local Port a Rails Instance Is Running On
Understanding Private Methods in Ruby
How to Pass Arguments into a Rake Task With Environment in Rails
Cannot Load Such File - Zlib Even After Using Rvm Pkg Install Zlib
Ruby, Difference Between Exec, System and %X() or Backticks
What Is the Easiest Way to Duplicate an Activerecord Record
Best Practices With Stdin in Ruby
How to Update Ruby Version 2.0.0 to the Latest Version in MAC Osx Yosemite