Ruby 2.0 Iconv Replacement

Ruby 2.0 iconv replacement

Iconv was deprecated (removed) in 1.9.3.
You can still install it.

Reference Material if you unsure:
https://rvm.io/packages/iconv/

However the suggestion is that you don't and rather use:

string.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => "?")

API

Equivalent of Iconv.conv(UTF-8//IGNORE,...) in Ruby 1.9.X?

I thought this was it:

string.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => "?")

will replace all knowns with '?'.

To ignore all unknowns, :replace => '':

string.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => "")

Edit:

I'm not sure this is reliable. I've gone into paranoid-mode, and have been using:

string.encode("UTF-8", ...).force_encoding('UTF-8')

Script seems to be running, ok now. But I'm pretty sure I'd gotten errors with this earlier.

Edit 2:

Even with this, I continue to get intermittant errors. Not every time, mind you. Just sometimes.

How to convert UTF-8 to ISO-8859-1 in Ruby 2.0?

The encode method does work.

Let's create a string with U+00FC (ü):

uuml_utf8 = "\u00FC"       #=> "ü"

Ruby encodes this string in UTF-8:

uuml_utf8.encoding         #=> #<Encoding:UTF-8>

In UTF-8, ü is represented as 195 188 (decimal):

uuml_utf8.bytes            #=> [195, 188]

Now let's convert the string to ISO-8859-1:

uuml_latin1 = uuml_utf8.encode("ISO-8859-1")

uuml_latin1.encoding #=> #<Encoding:ISO-8859-1>

In ISO-8859-1, ü is represented as 252 (decimal):

uuml_latin1.bytes          #=> [252]

In UTF-8 however 252 is an invalid sequence. That's why your terminal/console displays the replacement character "�" (U+FFFD) or no character at all.

In order to display ISO-8859-1 encoded characters, you'll have to switch your terminal/console to that encoding, too.

Transliteration in ruby

Ruby has an Iconv library in its stdlib which converts encodings in a very similar way to the usual iconv command

Rails 3.2.21 and Ruby 2.0 Performance Test Issues

The gcdata patch is only available for the latest version of 1.9.3. There is no gcdata patch for Ruby >= 2.0.0.
In my opinion, you have two options for this issue:

  1. Keep a branch of your application running a patched version of Rails 1.9.3, and run the tests there.
    The downside will be that the results may not be 100% accurate and if you will use Ruby 2 syntax your branch will break.

  2. Find another way to test the memory usage and created objects, or don't use the option at all with your new application.
    The good news is that Ruby 2+ now handles Garbage Collection better and you should not run out of memory (unless you are interested to get a number comparison for your app).

How would you write a test for the `Iconv.new(UTF8//IGNORE, ...)` idiom?

You can construct Strings from a byte array using the #pack method. This way, you can easily generate an invalid/bad string and use it in a test.

Example:

describe "#normalize" do
it "should remove/ignore invalid characters" do
# this "string" equals "Mandados de busca do caso Megaupload considerados inv\xE1lidos - Tecnologia - Sol"
bad_string = [77, 97, 110, 100, 97, 100, 111, 115, 32, 100, 101, 32, 98, 117, 115, 99, 97, 32, 100, 111, 32, 99, 97, 115, 111, 32, 77, 101, 103, 97, 117, 112, 108, 111, 97, 100, 32, 99, 111, 110, 115, 105, 100, 101, 114, 97, 100, 111, 115, 32, 105, 110, 118, 225, 108, 105, 100, 111, 115, 32, 45, 32, 84, 101, 99, 110, 111, 108, 111, 103, 105, 97, 32, 45, 32, 83, 111, 108].pack('c*').force_encoding('UTF-8')

normalize(bad_string).should == 'Mandados de busca do caso Megaupload considerados invlidos - Tecnologia - Sol'
end
end

(I'm sorry for the rather long test string, I just couldn't find a shorter example in my code)



Related Topics



Leave a reply



Submit