How to decode a string in Ruby
This is MIME encoded-word syntax as defined in RFC-2822. From Wikipedia:
The form is: "
=?
charset?
encoding?
encoded text?=
".
- charset may be any character set registered with IANA. Typically it would be the same charset as the message body.
- encoding can be either "
Q
" denoting Q-encoding that is similar to the quoted-printable encoding, or "B
" denoting base64 encoding.- encoded text is the Q-encoded or base64-encoded text.
Fortunately you don't need to write a decoder for this. The Mail gem comes with a Mail::Encodings.value_decode
method that works perfectly and is very well-tested:
subject = "=?UTF-8?B?TWlzc2lvbmFyecKgRmFpdGjCoFByb21pc2XCoGFuZMKgQ2FzaMKgUmVjZWlwdHPCoFlURMKgMjUzNQ==?= =?UTF-8?B?OTnCoEp1bHktMjAxNS5jc3Y=?="
Mail::Encodings.value_decode(subject)
# => "Missionary Faith Promise and Cash Receipts YTD 253599 July-2015.csv"
It gracefully handles lots of edge cases you probably wouldn't think of (until your app tries to handle them and falls over):
subject = "Re:[=?iso-2022-jp?B?GyRCJTAlayE8JV0lcyEmJTglYyVRJXMzdDwwMnEbKEI=?=\n =?iso-2022-jp?B?GyRCPFIbKEI=?=] =?iso-2022-jp?B?GyRCSlY/LiEnGyhC?=\n =?iso-2022-jp?B?GyRCIVolMCVrITwlXSVzIVskKkxkJCQ5ZyRvJDsbKEI=?=\n =?iso-2022-jp?B?GyRCJE43byRLJEQkJCRGIUolaiUvJSglOSVIGyhC?=#1056273\n =?iso-2022-jp?B?GyRCIUsbKEI=?="
Mail::Encodings.value_decode(subject)
# => "Re:[グルーポン・ジャパン株式会社] 返信:【グルーポン】お問い合わせの件について(リクエスト#1056273\n )"
If you're using Rails you already have the Mail gem. Otherwise just add gem "mail"
to your Gemfile, then bundle install
and, in your script, require "mail"
.
Ruby decode string
Turns out it’s CP850
.
Proper solution (Ruby 2.5+)
Normalize the unicode string and then encode it into CP850
:
"bürger".unicode_normalize(:nfc).encode(Encoding::CP850)
#⇒ "b\x81rger"
Works for both special characters and combined diacritics.
Fallback solution (Ruby 2.5-)
Encode and pray it’s a composed umlaut:
"bürger".encode(Encoding::CP850)
#⇒ "b\x81rger"
Ruby: How to decode strings which are partially encoded or fully encoded?
This looks like HTML encoding, not URL encoding.
require 'cgi'
CGI.unescapeHTML("info@cloudag.com")
#=> "info@cloudag.com"
Is there a way to decode q-encoded strings in Ruby?
I use this to parse email subjects:
You could try the following:
str = "=?UTF-8?Q?J=2E_Pablo_Fern=C3=A1ndez?="
if m = /=\?([A-Za-z0-9\-]+)\?(B|Q)\?([!->@-~]+)\?=/i.match(str)
case m[2]
when "B" # Base64 encoded
decoded = Base64.decode64(m[3])
when "Q" # Q encoded
decoded = m[3].unpack("M").first.gsub('_',' ')
else
p "Could not find keyword!!!"
end
Iconv.conv('utf-8',m[1],decoded) # to convert to utf-8
end
How to properly decode string with quoted-printable encoding in Ruby
Try to avoid weird Rails stuff when you have plain old good ruby to accomplish a task. String#unpack
is your friend.
"Demarcation by Theresa Castel=E3o-Lawless".
unpack("M").first. # unpack as quoted printable
force_encoding(Encoding::ISO_8859_1).
encode(Encoding::UTF_8)
#⇒ "Demarcation by Theresa Castelão-Lawless"
or, as suggested in comments by @Stefan, one can pass the source encoding as the 2nd argument:
"Demarcation by Theresa Castel=E3o-Lawless".
unpack("M").first. # unpack as quoted printable
encode('utf-8', 'iso-8859-1')
Note: force_encoding
is needed to tell the engine this is single-byte ISO with european accents before encoding into target UTF-8
.
How to decode utf8 characters with ruby Base64
When you decode64
you get back a string with BINARY
(a.k.a. ASCII-8BIT
) encoding:
Base64.decode64(encode).encoding
# => #<Encoding:ASCII-8BIT>
The trick is to force-apply a particular encoding:
Base64.decode64(encode).force_encoding('UTF-8')
# => "éééé"
This assumes that your string is valid UTF-8, which it might not be, so use with caution.
How do I encode/decode HTML entities in Ruby?
HTMLEntities can do it:
: jmglov@laurana; sudo gem install htmlentities
Successfully installed htmlentities-4.2.4
: jmglov@laurana; irb
irb(main):001:0> require 'htmlentities'
=> []
irb(main):002:0> HTMLEntities.new.decode "¡I'm highly annoyed with character references!"
=> "¡I'm highly annoyed with character references!"
Related Topics
How to Run Ruby on Rails Applications on a Windows Box
Scaffolding Activerecord: Two Columns of the Same Data Type
If(!X) VS If(X==False) in Ruby
Unpermitted Parameters for Dynamic Forms in Rails 4
How to Get the Final Url After Redirects Using Ruby
/Config/Initializers/Secret_Token.Rb Not Being Generated. Why Not
Consequences of Implementing To_Int and To_Str in Ruby
Ruby-Rails Serve Ftp File Direct to Client
Encoding::Undefinedconversionerror: "\Xe4" from Ascii-8Bit to Utf-8
Sharing an Enumerator Across Threads
Rvm and Osx Lion - Rvm 'Forgets' Gemsets on System Restart
Ruby: Ssl_Connect Syscall Returned=5 Errno=0 State=Unknown State (Openssl::Ssl::Sslerror)
Subtract Values in Hash from Corresponding Values in Another Hash