How to handle UTF-8 email headers (like Subject:) using Ruby?
Ahah! ActionMailer::Quoting
has a quoted_printable
method.
So here's what I did:
def my_email(foo)
...
@subject = quoted_printable(foo.some_subject_with_accented_chars, 'utf-8')
...
end
Doing this convinced Mail.app to display the rest of the email using UTF-8. Now to test the rest!
Rails, sending mail to an address with accented characters
finally solved this, if you wrap the local part of the email in quotes, and leave the domain part unquoted it works a treat. Seems like Mailer is encoding the full email address if you dont wrap in quotes, and hence breaks the encoding over to the server.
e.g.
somébody@here.com wont work
where as
"somébody"@here.com will work
routes through fine and displays fine in all clients.
Best way to find potential UTF-8 characters from an imported email
There are two things to consider. First, the original string format using =
is called "quoted-printable". Force UTF-8 encoding. Then, use htmlentities to convert to HTML entities. Here is an example:
require 'htmlentities'
coder = HTMLEntities.new
string = '=C2=A9'.unpack("M").first.force_encoding('UTF-8')
coder.encode(string) # => "©"
coder.encode(string, :named) # => "©"
I hope you find that helpful.
Unicode in X-Message only correct in chrome
HTTP headers are specified by RFC 2616 to carry ISO-8859-1 data. Safari and Firefox are doing the right thing. It is, according to the HTTP standard, not possible to put a Chinese character in an HTTP header value.
Browser behaviour is inconsistent here (Opera also uses UTF-8, and IE uses whatever the locale ANSI code page is), so it is unsafe to use non-ASCII characters in HTTP headers.
The usual workaround if you must use headers is to apply an ad-hoc encoding at each end, typically URL-encoding or base64-encoding over UTF-8 bytes. But it's usually best not to wrap arbitrary data in HTTP headers; put it in the body instead.
UTF-8 characters mangled in HTTP Basic Auth username
I want to allow any valid UTF-8 characters in usernames and passwords.
Abandon all hope. Basic Authentication and Unicode don't mix.
There is no standard(*) for how to encode non-ASCII characters into a Basic Authentication username:password token before base64ing it. Consequently every browser does something different:
- Opera uses UTF-8;
- IE uses the system's default codepage (which you have no way of knowing, other than it's never UTF-8), and silently mangles characters that don't fit into to it using the Windows ‘guess a random character that looks a bit like the one you wanted or maybe just not’ secret recipe;
- Mozilla uses only the lower byte of character codepoints, which has the effect of encoding to ISO-8859-1 and mangling the non-8859-1 characters irretrievably... except when doing XMLHttpRequests, in which case it uses UTF-8;
- Safari and Chrome encode to ISO-8859-1, and fail to send the authorization header at all when a non-8859-1 character is used.
*: some people interpret the standard to say that either:
- it should be always ISO-8859-1, due to that being the default encoding for including raw 8-bit characters directly included in headers;
- it should be encoded using RFC2047 rules, somehow.
But neither of these proposals are on topic for inclusion in a base64-encoded auth token, and the RFC2047 reference in the HTTP spec really doesn't work at all since all the places it might potentially be used are explicitly disallowed by the ‘atom context’ rules of RFC2047 itself, even if HTTP headers honoured the rules and extensions of the RFC822 family, which they don't.
In summary: ugh. There is little-to-no hope of this ever being fixed in the standard or in the browsers other than Opera. It's just one more factor driving people away from HTTP Basic Authentication in favour of non-standard and less-accessible cookie-based authentication schemes. Shame really.
Is there a way to decode q-encoded strings in Ruby?
I use this to parse email subjects:
You could try the following:
str = "=?UTF-8?Q?J=2E_Pablo_Fern=C3=A1ndez?="
if m = /=\?([A-Za-z0-9\-]+)\?(B|Q)\?([!->@-~]+)\?=/i.match(str)
case m[2]
when "B" # Base64 encoded
decoded = Base64.decode64(m[3])
when "Q" # Q encoded
decoded = m[3].unpack("M").first.gsub('_',' ')
else
p "Could not find keyword!!!"
end
Iconv.conv('utf-8',m[1],decoded) # to convert to utf-8
end
Related Topics
Ruby - Does Array a Contain All Elements of Array B
Using Ruby to Generate Sha512 Crypt-Style Hashes Formatted for /Etc/Shadow
Rails 3.2 Activeadmin 'Collection Is Not a Paginated Scope.' Error
Getting Ruby Function Object Itself
How to Set Environment Variable Using Chef
Rails Devise - Current_User Is Nil
How to Change Passenger Ruby Version Without Recompiling
Ruby Refuses to Divide Correctly
How to Read a Gzip File Line by Line
Need Help Maximizing 3 Factors in Multiple, Similar Objects and Ordering Appropriately
Ruby Tcpsocket: Find Out How Much Data Is Available
Colorized Output Breaks Linewrapping with Readline
How to Use Link_To to Link an Image and a Text
How to Group Numbers into Different Buckets in Ruby
How to Destroy a Record Without an Id Column in Ruby Activerecord