How to Decode Q-Encoded Strings in Ruby

How to decode a string in Ruby

This is MIME encoded-word syntax as defined in RFC-2822. From Wikipedia:

The form is: "=?charset?encoding?encoded text?=".

  • charset may be any character set registered with IANA. Typically it would be the same charset as the message body.
  • encoding can be either "Q" denoting Q-encoding that is similar to the quoted-printable encoding, or "B" denoting base64 encoding.
  • encoded text is the Q-encoded or base64-encoded text.

Fortunately you don't need to write a decoder for this. The Mail gem comes with a Mail::Encodings.value_decode method that works perfectly and is very well-tested:

subject = "=?UTF-8?B?TWlzc2lvbmFyecKgRmFpdGjCoFByb21pc2XCoGFuZMKgQ2FzaMKgUmVjZWlwdHPCoFlURMKgMjUzNQ==?= =?UTF-8?B?OTnCoEp1bHktMjAxNS5jc3Y=?="
Mail::Encodings.value_decode(subject)
# => "Missionary Faith Promise and Cash Receipts YTD 253599 July-2015.csv"

It gracefully handles lots of edge cases you probably wouldn't think of (until your app tries to handle them and falls over):

subject = "Re:[=?iso-2022-jp?B?GyRCJTAlayE8JV0lcyEmJTglYyVRJXMzdDwwMnEbKEI=?=\n =?iso-2022-jp?B?GyRCPFIbKEI=?=] =?iso-2022-jp?B?GyRCSlY/LiEnGyhC?=\n  =?iso-2022-jp?B?GyRCIVolMCVrITwlXSVzIVskKkxkJCQ5ZyRvJDsbKEI=?=\n =?iso-2022-jp?B?GyRCJE43byRLJEQkJCRGIUolaiUvJSglOSVIGyhC?=#1056273\n =?iso-2022-jp?B?GyRCIUsbKEI=?="
Mail::Encodings.value_decode(subject)
# => "Re:[グルーポン・ジャパン株式会社] 返信:【グルーポン】お問い合わせの件について(リクエスト#1056273\n )"

If you're using Rails you already have the Mail gem. Otherwise just add gem "mail" to your Gemfile, then bundle install and, in your script, require "mail".

Ruby decode string

Turns out it’s CP850.

Proper solution (Ruby 2.5+)

Normalize the unicode string and then encode it into CP850:

"bürger".unicode_normalize(:nfc).encode(Encoding::CP850)
#⇒ "b\x81rger"

Works for both special characters and combined diacritics.

Fallback solution (Ruby 2.5-)

Encode and pray it’s a composed umlaut:

"bürger".encode(Encoding::CP850)
#⇒ "b\x81rger"

Ruby: How to decode strings which are partially encoded or fully encoded?

This looks like HTML encoding, not URL encoding.

require 'cgi'

CGI.unescapeHTML("info@cloudag.com")
#=> "info@cloudag.com"

Is there a way to decode q-encoded strings in Ruby?

I use this to parse email subjects:

You could try the following:

str = "=?UTF-8?Q?J=2E_Pablo_Fern=C3=A1ndez?="
if m = /=\?([A-Za-z0-9\-]+)\?(B|Q)\?([!->@-~]+)\?=/i.match(str)
case m[2]
when "B" # Base64 encoded
decoded = Base64.decode64(m[3])
when "Q" # Q encoded
decoded = m[3].unpack("M").first.gsub('_',' ')
else
p "Could not find keyword!!!"
end
Iconv.conv('utf-8',m[1],decoded) # to convert to utf-8
end

How to properly decode string with quoted-printable encoding in Ruby

Try to avoid weird Rails stuff when you have plain old good ruby to accomplish a task. String#unpack is your friend.

"Demarcation by Theresa Castel=E3o-Lawless".
unpack("M").first. # unpack as quoted printable
force_encoding(Encoding::ISO_8859_1).
encode(Encoding::UTF_8)
#⇒ "Demarcation by Theresa Castelão-Lawless"

or, as suggested in comments by @Stefan, one can pass the source encoding as the 2nd argument:

"Demarcation by Theresa Castel=E3o-Lawless".
unpack("M").first. # unpack as quoted printable
encode('utf-8', 'iso-8859-1')

Note: force_encoding is needed to tell the engine this is single-byte ISO with european accents before encoding into target UTF-8.

How to decode utf8 characters with ruby Base64

When you decode64 you get back a string with BINARY (a.k.a. ASCII-8BIT) encoding:

Base64.decode64(encode).encoding
# => #<Encoding:ASCII-8BIT>

The trick is to force-apply a particular encoding:

Base64.decode64(encode).force_encoding('UTF-8')
# => "éééé"

This assumes that your string is valid UTF-8, which it might not be, so use with caution.

How do I encode/decode HTML entities in Ruby?

HTMLEntities can do it:

: jmglov@laurana; sudo gem install htmlentities
Successfully installed htmlentities-4.2.4
: jmglov@laurana; irb
irb(main):001:0> require 'htmlentities'
=> []
irb(main):002:0> HTMLEntities.new.decode "¡I'm highly annoyed with character references!"
=> "¡I'm highly annoyed with character references!"


Related Topics



Leave a reply



Submit