Parse Email Content from Quoted Reply

Get the actual email message that the person just wrote, excluding any quoted text

There are many libraries out there that can help you extract the reply/signature from a message:

  • Ruby: https://github.com/github/email_reply_parser
  • Python: https://github.com/zapier/email-reply-parser or https://github.com/mailgun/talon
  • JavaScript: https://github.com/turt2live/node-email-reply-parser
  • Java: https://github.com/edlio/EmailReplyParser
  • PHP: https://github.com/willdurand/EmailReplyParser

I've also read that Mailgun has a service to parse inbound email and POST its content to a URL of your choice. It will automatically strip quoted text from your emails: https://www.mailgun.com/blog/handle-incoming-emails-like-a-pro-mailgun-api-2-0/

Hope this helps!

Reliable way to only get the email text, excluding previous emails

The formatting of email replies depend on the clients. There is no realiable way to extract the newest message without the risk of removing too much or not enough.

However, a common way to mark quotes is by prefixing them with > so lines starting with that character - especially if there are multiple at the very end or beginning of the email - are likely to be quotes.

But the On Thu, Mar 24, 2011 at 3:51 PM, <test@test.com> wrote: from your example is hard to extract. A line ending with a : right before a quote might indicate that it belongs to the quote, you cannot know that for sure - it could also be part of the new message and the colon is just a typo'd . (on german keyboards : is SHIFT+.).



Related Topics



Leave a reply



Submit