How to Split Mailbox into Single File Per Message

How to split mailbox into single file per message?

Just use formail. formail is a program that can process mailbox, run some actions for each message in the mailbox, separate messages and so on.

More info: http://www.manpagez.com/man/1/formail/

If you want just split a mailbox to separate files,
I would suggest such solution:

$ cat $MAIL | formail -ds sh -c 'cat > msg.$FILENO'

From man:

   FILENO
While splitting, formail assigns the message number currently
being output to this variable. By presetting FILENO, you can
change the initial message number being used and the width of the
zero-padded output. If FILENO is unset it will default to 000.
If FILENO is non-empty and does not contain a number, FILENO gen-
eration is disabled.

Note: formail is also included in procmail - https://github.com/BuGlessRB/procmail .

How to split an mbox file into n-MB big chunks using the terminal?

If your mbox is in standard format, each message will begin with From and a space:

From someone@somewhere.com

So, you could COPY YOUR MBOX TO A TEMPORARY DIRECTORY and try using awk to process it, on a message-by-message basis, only splitting at the start of any message. Let's say we went for 1,000 messages per output file:

awk 'BEGIN{chunk=0} /^From /{msgs++;if(msgs==1000){msgs=0;chunk++}}{print > "chunk_" chunk ".txt"}' mbox

then you will get output files called chunk_1.txt to chunk_n.txt each containing up to 1,000 messages.

If you are unfortunate enough to be on Windows (which is incapable of understanding single quotes), you will need to save the following in a file called awk.txt

BEGIN{chunk=0} /^From /{msgs++;if(msgs==1000){msgs=0;chunk++}}{print > "chunk_" chunk ".txt"}

and then type

awk -f awk.txt mbox

How to split single mail with procmail?

You seem to have two or three different questions; proper etiquette on Stack Overflow would be to ask each one separately - this also helps future visitors who have just one of your problems.

First off, to split a Berkeley mbox file containing multiple messages and run Procmail on each separately, try

formail -s procmail -m <file.mbox

You might need to read up on the mailbox formats supported by Procmail. A Berkeley mailbox is a single file which contains multiple messages, simply separated by a line beginning with From (with a space after the four alphabetic characters). This separator has to be unique, and so a message which contains those five characters at beginning of a line in the body will need to be escaped somehow (typically by writing a > before From).

To save each message in a separate file, choose a different mailbox format than the single-file Berkeley format. Concretely, if the destination is a directory, Procmail will create a new file in that directory. How exactly the new file is named depends on the contents of the directory (if it contains the Maildir subdirectories new, tmp, and cur, the new file is created in new in accordance with Maildir naming conventions) and on how exactly the directory is specified (trailing slash and dot selects MH format; otherwise, mail directory format).

Saving to one mailbox per recipient has a number of pesky corner cases. What if the message was sent to more than one of your local recipients? What if the recipient address is not visible in the headers? etc (the Procmail Mini-FAQ has a section about this, in the context of virtual hosting of a domain, which this is basically a variation of). But if we simply ignore these, you might be able to pull it off with something like

:0  # whitespace before ] is a literal tab
* ^TO_\/[^ @ ]+@(yourdomain\.example|example\.info)\>
{
# Trim domain part from captured MATCH
:0
* MATCH ?? ^\/[^@]+
./$MATCH/
}

This will capture into $MATCH the first address which matches the regex, then perform another regex match on the captured string to capture just the part before the @ sign. This obviously requires that the addresses you want to match are all in a set of specific domains (here, I used yourdomain.example and example.info; obviously replace those with your actual domain names) and that capturing the first matching address is sufficient (so if a message was To: alice@yourdomain.example and Cc: bob@example.info, whichever one of those is closer to the top of the message will be picked out by this recipe, and the other one will be ignored).

In some more detail, the \/ special token causes Procmail to copy the text which matched the regex after this point into the internal variable MATCH. As this recipe demonstrates, you can then perform a regex match on that variable itself to extract a substring of it (or, in other words, discard part of the captured match).

The action ./$MATCH/ uses the captured string in MATCH as the name of the folder to save into. The leading ./ specifies the current directory (which is equal to the value of the Procmail variable MAILDIR) and the trailing / selects mail directory format.

If your expected recipients cannot be constrained to be in a specific set of domains or otherwise matched by a single regex, my recommendation would be to ask a new question with more limited scope, and enough details to actually identify what you want to accomplish.

Loop for each message in mail file

Edit : typical case of "When all you have is an hammer, everything looks like a nail" ; check formail from the procmail mail-processing-package as suggested by twalberg.

For something quick and dirty, you should be able to use the following to separate the records with NUL bytes which would make iterating over them easier :

sed '/^\n/N;s/^From/\x0&/' /var/mail/targetMailbox

For example you can use this command and split to split your mailbox into multiple files of manageable size :

sed '/^\n/N;s/^From/\x0&/' /var/mail/targetMailbox | split -l 100 -t'\0' - /tmp/mailbox

This command will split the mailbox into chunks of 100 messages which will be written to their own file in /tmp/ ; check split's options if you're interested in splitting the file, it supports a lot of different ways to do so.

A lot of (recent?) GNU tools will have a -0 or -z option to make them handle NUL-separated records, for example :

  • -z for grep, head and tail
  • -0 for xargs

To iterate over them directly from bash, the easiest is to use a while read loop with read's -d option to specify the use of NUL as a separator.

For a more permanent solution, you need to find how to use an existing mbox parser.

use formail to get last email of an email file

You can do it like this:

formail -s formail -czx Message-Id: <mailbox | tail -1

This is probably not very efficient. However, more efficient methods are likely to be a lot more complicated.

How to parse mailbox file in Ruby?

The good news is the Mbox format is really dead simple, though it's simplicity is why it was eventually replaced. Parsing a large mailbox file to extract a single message is not specially efficient.

If you can split apart the mailbox file into separate strings, you can pass these strings to the Mail library for parsing.

An example starting point:

def parse_message(message)
Mail.new(message)

do_other_stuff!
end

message = nil

while (line = STDIN.gets)
if (line.match(/\AFrom /))
parse_message(message) if (message)
message = ''
else
message << line.sub(/^\>From/, 'From')
end
end

The key is that each message starts with "From " where the space after it is key. Headers will be defined as From: and any line that starts with ">From" is to be treated as
actually being "From". It's things like this that make this encoding method really inadequate, but if Maildir isn't an option, this is what you've got to do.

Splitting a report into separate emails with their individual reports

Your opening a report called test and then closing another report called "Unaffirmed Report". You need to open and close the same report, in this case "test".
DoCmd.Close acReport, "test", acSaveNo. This should fix the employee data not updating, since the report remains open on the first employee.

To directly send the message you need change EditMessage:=True to EditMessage:=False.
Check the docs:
https://docs.microsoft.com/en-us/office/vba/api/access.docmd.sendobject

Also if you need to test this, set outlook in Offline mode, and run your code, check the messages in your Outbox to see if they're as expected. You can delete the messages from the Outbox to prevent them from being sent. Once you're finished with testing you can set Outlook back to Online Mode.

Regarding the email address issue, this comes automatically when using hyperlinks in your controls. You'll need to strip the extra part out with strTo = Left(![Email],InStr(![Email],"#")-1). Check your data if this will be valid for all email addresses. For a more advanced solution you can look at this post https://codekabinett.com/rdumps.php?Lang=2&targetDoc=access-hyperlink-data-type.

Code provided as reference, please see the post for the explanation.

'copied from https://codekabinett.com/rdumps.php?Lang=2&targetDoc=access-hyperlink-data-type

Public Function GetHyperlinkFullAddress(ByVal hyperlinkData As Variant, Optional ByVal removeMailto As Boolean) As Variant

Const SEPARATOR As String = "#"

Dim retVal As Variant
Dim tmpArr As Variant

If IsNull(hyperlinkData) Then
retVal = hyperlinkData
Else

If InStr(hyperlinkData, SEPARATOR) > 0 Then
' I append 4 separators at the end, so I don't have to worry about the
' lenght of the array returned by Split()
hyperlinkData = hyperlinkData & String(4, SEPARATOR)
tmpArr = Split(hyperlinkData, SEPARATOR)

If Len(tmpArr(1)) > 0 Then
retVal = tmpArr(1)
If Len(tmpArr(2)) > 0 Then
retVal = retVal & "#" & tmpArr(2)
End If
End If
Else
retVal = hyperlinkData
End If

If Left(retVal, 7) = "mailto:" Then
retVal = Mid(retVal, 8)
End If

End If

GetHyperlinkFullAddress = retVal

End Function


Related Topics



Leave a reply



Submit