How to Handle Utf-8 Email Headers (Like Subject:) Using Ruby

How to handle UTF-8 email headers (like Subject:) using Ruby?

Ahah! ActionMailer::Quoting has a quoted_printable method.

So here's what I did:

def my_email(foo)
...
@subject = quoted_printable(foo.some_subject_with_accented_chars, 'utf-8')
...
end

Doing this convinced Mail.app to display the rest of the email using UTF-8. Now to test the rest!

Rails, sending mail to an address with accented characters

finally solved this, if you wrap the local part of the email in quotes, and leave the domain part unquoted it works a treat. Seems like Mailer is encoding the full email address if you dont wrap in quotes, and hence breaks the encoding over to the server.

e.g.
somébody@here.com wont work

where as
"somébody"@here.com will work

routes through fine and displays fine in all clients.

Best way to find potential UTF-8 characters from an imported email

There are two things to consider. First, the original string format using = is called "quoted-printable". Force UTF-8 encoding. Then, use htmlentities to convert to HTML entities. Here is an example:

require 'htmlentities'
coder = HTMLEntities.new
string = '=C2=A9'.unpack("M").first.force_encoding('UTF-8')

coder.encode(string) # => "©"
coder.encode(string, :named) # => "©"

I hope you find that helpful.

Unicode in X-Message only correct in chrome

HTTP headers are specified by RFC 2616 to carry ISO-8859-1 data. Safari and Firefox are doing the right thing. It is, according to the HTTP standard, not possible to put a Chinese character in an HTTP header value.

Browser behaviour is inconsistent here (Opera also uses UTF-8, and IE uses whatever the locale ANSI code page is), so it is unsafe to use non-ASCII characters in HTTP headers.

The usual workaround if you must use headers is to apply an ad-hoc encoding at each end, typically URL-encoding or base64-encoding over UTF-8 bytes. But it's usually best not to wrap arbitrary data in HTTP headers; put it in the body instead.

UTF-8 characters mangled in HTTP Basic Auth username

I want to allow any valid UTF-8 characters in usernames and passwords.

Abandon all hope. Basic Authentication and Unicode don't mix.

There is no standard(*) for how to encode non-ASCII characters into a Basic Authentication username:password token before base64ing it. Consequently every browser does something different:

  • Opera uses UTF-8;
  • IE uses the system's default codepage (which you have no way of knowing, other than it's never UTF-8), and silently mangles characters that don't fit into to it using the Windows ‘guess a random character that looks a bit like the one you wanted or maybe just not’ secret recipe;
  • Mozilla uses only the lower byte of character codepoints, which has the effect of encoding to ISO-8859-1 and mangling the non-8859-1 characters irretrievably... except when doing XMLHttpRequests, in which case it uses UTF-8;
  • Safari and Chrome encode to ISO-8859-1, and fail to send the authorization header at all when a non-8859-1 character is used.

*: some people interpret the standard to say that either:

  • it should be always ISO-8859-1, due to that being the default encoding for including raw 8-bit characters directly included in headers;
  • it should be encoded using RFC2047 rules, somehow.

But neither of these proposals are on topic for inclusion in a base64-encoded auth token, and the RFC2047 reference in the HTTP spec really doesn't work at all since all the places it might potentially be used are explicitly disallowed by the ‘atom context’ rules of RFC2047 itself, even if HTTP headers honoured the rules and extensions of the RFC822 family, which they don't.

In summary: ugh. There is little-to-no hope of this ever being fixed in the standard or in the browsers other than Opera. It's just one more factor driving people away from HTTP Basic Authentication in favour of non-standard and less-accessible cookie-based authentication schemes. Shame really.

Is there a way to decode q-encoded strings in Ruby?

I use this to parse email subjects:

You could try the following:

str = "=?UTF-8?Q?J=2E_Pablo_Fern=C3=A1ndez?="
if m = /=\?([A-Za-z0-9\-]+)\?(B|Q)\?([!->@-~]+)\?=/i.match(str)
case m[2]
when "B" # Base64 encoded
decoded = Base64.decode64(m[3])
when "Q" # Q encoded
decoded = m[3].unpack("M").first.gsub('_',' ')
else
p "Could not find keyword!!!"
end
Iconv.conv('utf-8',m[1],decoded) # to convert to utf-8
end


Related Topics



Leave a reply



Submit