Mail Gem - How to Clean Up the Body String

Mail gem - how to clean up the body string

If you have a properly formatted email, you can use Mail helper methods:

mail = Mail.new(email_string)
mail.text_part # finds the first text/plain part
mail.html_part # finds the first text/html part

This doesn't always work if you have e.g. single part messages (text only) or receive email from the internet at large since you can't rely on formatting from every client out there. Believe me, I've learned the hard way.

Ruby Mail gem extract headers and clean up body

Yes, it's because real-world email has all kinds of surprises that don't fit the protocol.

To get the header part and body part:

header_part, body_part = message.body.split(/\n\s*\n/m, 2)

You may find some useful patterns for your parsing in this file:

lib/mail/patterns.rb

How to clean up a string (email body) with regards to special characters?

You need to use a MIME parser, which should take care of removing the headers and getting rid of the quoted printable encoding. Depending on the layout of your email, body[text] might get you a lot more than you want. You need to either download the BODYSTRUCTURE and pick out the parts you want, or download the entire message (BODY[]) and use a MIME parser.

Rails - Mail, getting the body as Plain Text

The code above:

message = Mail.new(params[:message])

will create a new instance of the mail gem from the full message. You can then use any of the methods on that message to get the content. You can therefore get the plain content using:

message.text_part

or the HTML with

message.html_part

These methods will just guess and find the first part in a multipart message of either text/plain or text/html content type. CloudMailin also provides these as convenience methods however via params[:plain] and params[:html]. It's worth remembering that the message is never guaranteed to have a plain or html part. It may be worth using something like the following to be sure:

plain_part = message.multipart? ? (message.text_part ? message.text_part.body.decoded : nil) : message.body.decoded
html_part = message.html_part ? message.html_part.body.decoded : nil

As a side note it's also important to extract the content encoding from the message when you use these methods and make sure that the output is encoded into the encoding method you desire (such as UTF-8).

Mail gem determine whether plaintext or html

You could look at maildata.content_type:

maildata.content_type
#=> "text/plain; charset=us-ascii"

If it's a multipart e-mail, you could have both plain text and HTML. You could then look at the parts array to see which content types it includes:

maildata.content_type
#=> "multipart/alternative; boundary=\"--==_mimepart_4f848491e618f_7e4b6c1f3849940\"; charset=utf-8"

maildata.parts.collect { |part| part.content_type }
#=> ["text/plain; charset=utf-8", "text/html; charset=utf-8"]

Rails gmail gem: Get correct address of sender and message body

A work-around that I found is with Net::IMAP.

Note that the email needed from From is done mail.from[0].

imap = Net::IMAP.new('imap.gmail.com', 993, usessl = true, certs = nil, verify = false)
imap.login(USERNAME, PASSWORD)
imap.select('Inbox')

imap.search(["ALL"]).each do |message_id|
emails = imap.fetch(message_id,'RFC822')[0].attr['RFC822']
mail = Mail.read_from_string emails
@email = Email.create(:subject => mail.subject, :message => mail.body.decoded, :sender => mail.from[0], :date => mail.date)
end

imap.disconnect

Character encoding with Ruby 1.9.3 and the mail gem

After playing a bit, I found this:

body.decoded.force_encoding("ISO-8859-1").encode("UTF-8") # => "This reply has accents: Résumé..."
message.parts.map { |part| part.decoded.force_encoding("ISO-8859-1").encode(part.charset) } # multi-part

You can extract the charset from the message like so.

message.charset #=> for simple, non-multipart
message.parts.map { |part| part.charset } #=> for multipart, each part can have its own charset

Be careful with non-multipart, as the following can cause trouble:

body.charset #=> returns "US-ASCII" which is WRONG!
body.force_encoding(body.charset).encode("UTF-8") #=> Conversion error...

body.force_encoding(message.charset).encode("UTF-8") #=> Correct conversion :)


Related Topics



Leave a reply



Submit