Python:How to Parse the Body from a Raw Email , Given That Raw Email Does Not Have a "Body" Tag or Anything

Python : How to parse the Body from a raw email , given that raw email does not have a Body tag or anything

Use Message.get_payload

b = email.message_from_string(a)
if b.is_multipart():
for payload in b.get_payload():
# if payload.is_multipart(): ...
print payload.get_payload()
else:
print b.get_payload()

How can I read the mail body of a mail with Python?

You have everything in place. Just have to understand a few concepts.

"email" library allows you to convert typical email bytes into an easily usable object called Message using its parser APIs, such as message_from_bytes(), message_from_string(), etc.

The typical error is due to an input error.

email.message_from_bytes(data[0][1].decode())

The function above, message_from_bytes, takes bytes as an input not str. So, it is redundant to decode data[0][1] and also inputting through the parser API.

In short, you are trying to parse the original email message twice using message_from_bytes(data[0][1]) and message_from_string(email_message_raw). Get rid of one of them and you will be all set!

Try this approach:

    HOST = 'imap.host'
USERNAME = 'name@domain.com'
PASSWORD = 'password'

m = imaplib.IMAP4_SSL(HOST, 993)
m.login(USERNAME, PASSWORD)
m.select('INBOX')

result, data = m.uid('search', None, "UNSEEN")
if result == 'OK':
for num in data[0].split()[:5]:
result, data = m.uid('fetch', num, '(RFC822)')
if result == 'OK':
email_message = email.message_from_bytes(data[0][1])
email_from = str(make_header(decode_header(email_message_raw['From'])))
# von Edward Chapman -> https://stackoverflow.com/questions/7314942/python-imaplib-to-get-gmail-inbox-subjects-titles-and-sender-name
subject = str(email.header.make_header(email.header.decode_header(email_message_raw['Subject'])))
# content = email_message_raw.get_payload(decode=True)
# von Todor Minakov -> https://stackoverflow.com/questions/17874360/python-how-to-parse-the-body-from-a-raw-email-given-that-raw-email-does-not
# b = email.message_from_string(email_message_raw)
# this is already set as Message object which have many methods (i.e. is_multipart(), walk(), etc.)
b = email_message
body = ""

if b.is_multipart():
for part in b.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))

# skip any text/plain (txt) attachments
if ctype == 'text/plain' and 'attachment' not in cdispo:
body = part.get_payload(decode=True) # decode
break
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
body = b.get_payload(decode=True)

m.close()
m.logout()

txt = body
regarding = subject
print("###########################################################")
print(regarding)
print("###########################################################")
print(txt)
print("###########################################################")

Python : How to parse things such as : from, to, body, from a raw email source w/Python

I don't really understand what your final code snippet has to do with anything - you haven't mentioned anything about HTML until that point, so I don't know why you would suddenly be giving an example of parsing HTML (which you should never do with a regex anyway).

In any case, to answer your original question about getting the headers from an email message, Python includes code to do that in the standard library:

import email
msg = email.message_from_string(email_string)
msg['from'] # 'root@a1.local.tld'
msg['to'] # 'ooo@a1.local.tld'

Read body from raw email

If you simply want to parse the email and access the body, then consider using mail-parser. It's a simple mail-parser that takes as input a raw email and generates a parsed object.

import mailparser

mail = mailparser.parse_from_file(f)
mail = mailparser.parse_from_file_obj(fp)
mail = mailparser.parse_from_string(raw_mail)
mail = mailparser.parse_from_bytes(byte_mail)

How to Use:

mail.body #use this to access the body contents
mail.to

Python parse a raw email and get the text content of the body

Using the BeautifulSoup library, it is actually not too hard to parse the text. If you do not have the library, make sure you pip install bs4 first. After that, it shouldn't be too hard:

from bs4 import BeautifulSoup
def print_payload(message):
print('******')
if message.is_multipart():
for payload in message.get_payload():
print_payload(payload)
else:
print message.get_payload()
for part in message.walk():
if part.get_content_type():
body = str(part.get_payload())
soup = BeautifulSoup(body)
paragraphs = soup.find_all('p')
for paragraph in paragraphs:
print(paragraph.text)
print('******')

What BeautifulSoup does eloquently is creating a parse-tree, from which html elements can be selected. So if your e-mail has other html elements in it, you may have to also search for them to get all the data. But with this simple e-mail, finding all the html elements with the tag 'p' is sufficient.

Get body text of an email using python imap and email package

You are assuming that messages have a uniform structure, with one well-defined "main part". That is not the case; there can be messages with a single part which is not a text part (just an "attachment" of a binary file, and nothing else) or it can be a multipart with multiple textual parts (or, again, none at all) and even if there is only one, it need not be the first part. Furthermore, there are nested multiparts (one or more parts is another MIME message, recursively).

In so many words, you must inspect the MIME structure, then decide which part(s) are relevant for your application. If you only receive messages from a fairly static, small set of clients, you may be able to cut some corners (at least until the next upgrade of Microsoft Plague hits) but in general, there simply isn't a hierarchy of any kind, just a collection of (not necessarily always directly related) equally important parts.

How to parse email body without previous conversation ?

You can get it like so:

import re

with open(file, 'r') as f:
print re.findall(r'^.*?(?=On \w{3},)', f.read(), re.DOTALL)[0].strip()

Output:

Dear vaia,

Sale order fail to sync when it contain Generic Product. ....need to little
investigate about it.
This is a issue which is occurred if any product have salePrice of greated
then 2 decimal value like 33.34500 etc.
Now POS only use @ decimal values like 33.34 so please aware about this
about configuring prism to have always 2 decimal digits.

Regex:

^.*?(?=On \w{3},) - Match everything from starting till first occurence of On \w{3}, pattern.

re.DOTALL will make the . match newline characters as well.

python imap read email body return None after get_payload

From the documentation,

If the message is a multipart and the decode flag is True, then None is returned.

Moral: Don't set the decode flag when you fetch multipart messages.

If you are going to parse multipart messages, you might become familiar with the relevant RFC. Meanwhile, this quick-and-dirty might get you the data you need:

msg=email.message_from_string(data1[0][1])

# If we have a (nested) multipart message, try to get
# past all of the potatoes and straight to the meat
# For production, you might want a more thought-out
# approach, but maybe just fetching the first item
# will be sufficient for your needs
while msg.is_multipart():
msg = msg.get_payload(0)

content = msg.get_payload(decode=True)


Related Topics



Leave a reply



Submit