Simple Conversion Of String To UTF-8 in Ruby 1.8
James Edward Gray II has a detailed collections of posts dealing with encoding and character set issues in Ruby 1.8. The post entitled Encoding Conversion with iconv contains detailed information.
Summary: the iconv
gem does all the work of converting encodings. Make sure it's installed with:
gem install iconv
Now, you need to know what encoding your string is currently in as Ruby 1.8 treats Strings as an array of bytes (with no intrinsic encoding.) For example, say your string was in latin1 and you wanted to convert it to utf-8
require 'iconv'
string_in_utf8_encoding = Iconv.conv("UTF8", "LATIN1", string_in_latin1_encoding)
The order of arguments is:
- Target encoding
- Source encoding
- String to convert
How to convert hex into UTF-8 on ruby (1.8.7)?
Ruby String unpack? http://ruby-doc.org/core/classes/String.src/M001112.html.
For example:
"\x68\x65\x6c\x6c\x6f".unpack("Z*") --> "hello"
Handling string encoding with the same code in Ruby 1.8 and 1.9
I'm with Mike Lewis in using respond_to
, but don't do it on the variable res everywhere throughout your code.
I took a look at your code in gateway.rb and it looks like everywhere you are using res
, it gets set by a call to make_api_request
so you could add this before your return statement in that method:
doc = doc.force_encoding("UTF-8") if doc.respond_to?(:force_encoding)
Even if it's other places but it's not literally with every string you encounter, I'm sure you can find a way to refactor the code that makes sense and solves the problems in one place instead of everywhere you encounter it.
Are you having a problem with other places?
How do pack and unpack guesses the character encoding when converting to and from utf8?
This actually has nothing to do with how \xBD
is represented in ISO-8859-x. The critical part is the pack
into UTF-8.
The pack
receives [189]
. The code point 189 is defined in UTF-8 (more precisely, Unicode) as being ½
. Don't think of this as the Unicode spec writers for "preferring" ISO-8859-1 over ISO-8859-9. They had to make a choice of what code point represented ½
and they just chose 189.
Since you're trying to learn more about pack
/unpack
, let me explain more:
When you unpack
with the C
directive, ruby interprets the string as ascii-8bit, and extracts the ascii codes. In this case \xBD
translates to 0xBD
a.k.a. 189
. This is a really basic conversion.
When you pack
with the U
directive, ruby will look up in its UTF-8 translation table to see what codepoints map to each of the integers in the array.
pack
/unpack
have very specific behavior depending on the directives you provide it. I suggest reading up on ruby-doc.org. Some of the directives still don't make sense to me, so don't be discouraged.
Converting UTF8 to ANSI with Ruby
ascii_str = yourUTF8text.unpack("U*").map{|c|c.chr}.join
assuming that your text really does fit in the ascii character set.
Related Topics
How to Share State Between Scenarios Using Cucumber
Grabbing Snapshots from Webcams in Ruby
Is There an Equivalent of Array#Find_Index for the Last Index in Ruby
How to Make Ruby's Restclient Gem Respect Content_Type on Post
Ruby Map Method Syntax Question
Rvm Error While Running Make Install. Error Comes While Installing Power_Assert Gem
Error Installing Rubymine, No Sdk Specified, But It Is Listed
Why Is Throw and Catch Hardly Ever Used in Ruby
Alter $Path in Vim/Macvim So as to Find the Right Ruby Binary
How to Force a Gem's Dependencies in Gemfile
Keyword for Exclusive or in Ruby
Regex to Validate String Having Only Characters (Not Special Characters), Blank Spaces and Numbers
How to Use Mongodb Ruby Driver to Do a "Group" (Group By)
Ransack, Postgres - Sort on Column from Associated Table with Distinct: True
Need to Use Add_Index on Migration for Belongs_To/Has_Many Relationship? (Rails 3.2, Active Record)