How to Parse Mailbox File in Ruby

How to parse mailbox file in Ruby?

The good news is the Mbox format is really dead simple, though it's simplicity is why it was eventually replaced. Parsing a large mailbox file to extract a single message is not specially efficient.

If you can split apart the mailbox file into separate strings, you can pass these strings to the Mail library for parsing.

An example starting point:

def parse_message(message)
Mail.new(message)

do_other_stuff!
end

message = nil

while (line = STDIN.gets)
if (line.match(/\AFrom /))
parse_message(message) if (message)
message = ''
else
message << line.sub(/^\>From/, 'From')
end
end

The key is that each message starts with "From " where the space after it is key. Headers will be defined as From: and any line that starts with ">From" is to be treated as
actually being "From". It's things like this that make this encoding method really inadequate, but if Maildir isn't an option, this is what you've got to do.

Need help to parse mbox file and store certain attributes into MySQL database?

The "mbox format" is essentially a bunch of email messages concatenated together. If Ruby doesn't have a library for parsing the format, you can parse it manually by simply splitting the file on each line starting with the word "From ", then undoing any "From "-quoting within each message. This will leave you with a collection of emails.

Parsing emails is left as an exercise for the student.

How can I parse the full name from an email using Ruby?

You should be able to use the display_names property like so:

mail[:from].display_names.first
mail[:to].display_names.first

Note that display_names is an array as a message can have more than one recipient. The code above would get you the first name.

Why doesn't this code using the ruby-mbox gem parse mbox files?

Sorry people, I should have looked harder before posting here...

Fixed it:

Just inserted the line require 'stringio' to give this:

#!/usr/bin/ruby
require 'rubygems'
require 'stringio'
require 'mbox'
m = IO.read('test.eml')
puts m.size
m = Mbox.new(m)
puts m

It looks like stringio is assumed to be loaded - but isn't loaded explicitly by ruby-mbox...

Oddly, the example scripts don't load it either...

Parse /var/email/username file in Ruby

Yes, that seems like more or less the right way to parse the mbox format - from a quick scan of the RFC specification:

The structure of the separator lines
vary across implementations, but

usually contain the exact character
sequence of "From", followed by a

single Space character (0x20), an
email address of some kind, another

Space character, a timestamp sequence
of some kind, and an end-of- line
marker.

And...

Many implementations are also known
to escape message body lines that

begin with the character sequence of
"From ", so as to prevent confusion
with overly-liberal parsers that do
not search for full separator
lines. In the common case, a leading
Greater-Than symbol (0x3E) is used
for this purpose (with "From "
becoming ">From "). However, other
implementations are known not to
escape such lines unless they are
immediately preceded by a blank line
or if they also appear to contain
an email address and a timestamp.
Other implementations are also
known to perform secondary escapes
against these lines if they are
already escaped or quoted, while
others ignore these mechanisms
altogether.

Update:
There's also this: https://github.com/meh/ruby-mbox

How can I parse an email passed as a string in Rails 3

I've found the simplest way to parse the raw email text:

mail = Mail.new raw_mail

In that case mail is an instance of Mail::Message, with all of the methods I need to access mail attributes, attachments easily.

Parsing lines of text from external file in Ruby

Try this

raw_email = File.open("sample-email.txt", "r")
parsed_email = {}

raw_email.each do |line|
case line.split(":")[0]
when "Delivered-To"
parsed_email[:to] = line
when "From"
parsed_email[:from] = line
when "Date"
parsed_email[:date] = line
when "Subject"
parsed_email[:subject] = line
end
end

puts parsed_email
=> {:to=>"Delivered-To: user1@example.com\n", :from=>"From: John Doe <user2@example.com>\n", :date=>"Date: Tue, 12 Dec 2017 13:30:14 -0500\n", :subject=>"Subject: Testing the parser\n"}

Explanation
You need to split line on : and select first. Like this line.split(":")[0]



Related Topics



Leave a reply



Submit