Ruby Imap "Changes" Since Last Check

Ruby IMAP changes since last check

There is an IMAP extension for Quick Flag Changes Resynchronization (RFC-4551). With this extension it is possible to search for all messages that have been changed since the last synchronization (based on some kind of timestamp). However, as far as I know this extension is not widely supported.

There is an informational RFC that describes how IMAP clients should do synchronization (RFC-4549, section 4.3). The text recommends issuing the following two commands:

tag1 UID FETCH <lastseenuid+1>:* <descriptors>
tag2 UID FETCH 1:<lastseenuid> FLAGS

The first command is used to fetch the required information for all unknown mails (without knowing how many mails there are). The second command is used to synchronize the flags for the already seen mails.

AFAIK this method is widely used. Therefore, many IMAP servers contain optimizations in order to provide this information quickly. Typically, the network bandwidth is the limiting factor.

Getting only new mail from an IMAP server

You want to use the UniqueId (UID) for the messages. This is specifically why it was created.

You will want to keep track of the last UID requested, and then, to request all new messages you use the message set "[UID]:*", where [UID] is the actual UID value.

For example, lets say the last message feteched had a unique id of "123456". You would fetch

123456:*

Then, discard the first returned message.

UIDs are 'supposed' to be stable across sessions, and never change, and always increase in value. The catch to verify this, is to check the UIDValidity when you select the folder. If the UIDValidity number hasn't changed, then the UIDs should still be valid across sessions.

Here are the relevant parts from the RFC:

2.3.1.1. Unique Identifier (UID) Message Attribute

A 32-bit value assigned to each message, which when used with the
unique identifier validity value (see below) forms a 64-bit value
that MUST NOT refer to any other message in the mailbox or any
subsequent mailbox with the same name forever. Unique identifiers
are assigned in a strictly ascending fashion in the mailbox; as each
message is added to the mailbox it is assigned a higher UID than the
message(s) which were added previously. Unlike message sequence
numbers, unique identifiers are not necessarily contiguous.

The unique identifier of a message MUST NOT change during the
session, and SHOULD NOT change between sessions. Any change of
unique identifiers between sessions MUST be detectable using the
UIDVALIDITY mechanism discussed below. Persistent unique identifiers
are required for a client to resynchronize its state from a previous
session with the server (e.g., disconnected or offline access
clients); this is discussed further in [IMAP-DISC].

Note: The next unique identifier value is intended to
provide a means for a client to determine whether any
messages have been delivered to the mailbox since the
previous time it checked this value.

Here is the link with more info:

http://www.faqs.org/rfcs/rfc3501.html

What I would do, is also keep track of the InternalDate of the messages downloaded. This way, if you ever lose UID sync, you can at least iterate through the messages, and find the last one you downloaded, based upon the InternalDate of the message.

IMAP Client Sync local messages Server?

For sync, probably you need each folder all messages UID and flags.
You can compare local cached UIDs to server returned, with this you can dedect new messages and deleted(

Probably you should use some kind of hastable for search/compare, this will speed up all.

How to properly handle the changing of email uids over previous sessions?

You cannot really do anything. Once the IMAP server tells you that the UIDVALIDITY has changed, then the only standard-compliant, reliable and safe action is to discard everything from your local cache.

There are some non-standard extensions which might help you. GMail, for example, has its own X-GM-MSGID, but it doesn't specify whether these get invalidated upon an UIDVALIDITY change.

There were some efforts among the Courier and Dovecot maintainers to standardize a DIGEST extension for computing cryptographic hashes of individual messages. These would be exactly what you're looking for. However, I don't think these ever got standardized. Also, keep in mind that the MIME standard allows for several equivalent representations of any given message (think various 8bit encoding schemes). Any body digesting breaks once the MIME structure changes.

I would not attempt to use the Message-Id if I were you. Its value is user-controlled.

Ruby IMAP library: How can I show all messages in a folder?

To get the UIDs of all emails in a mailbox, use imap.uid_search(["ALL"]). You can then fetch them all in RFC822 (.eml) format (which includes their attachments) like this:

require "net/imap"
require "mail"

# initialize your imap object here
imap = Net::IMAP.new(...)
imap.login(...)

imap.list("", "*").map(&:name).each do |mailbox|
    imap.examine(mailbox)

    # Create directory for mailbox backup
    Dir.mkdir(mailbox) unless Dir.exist? mailbox

    uids = imap.uid_search(["ALL"])
    uids.each_with_index do |uid, i|
        # fetch the email in RFC822 format
        raw_email = imap.uid_fetch(uid, "RFC822").first.attr["RFC822"]

        # use the "mail" gem to parse the raw email and extract some useful info
        email = Mail.new(raw_email)
        puts "[#{i+1}/#{uids.length}] Saving email #{mailbox}/#{uid} (#{email.subject.inspect} from #{email.from.first} at #{email.date})"

        # save the email to a file
        File.write(File.join(mailbox, "#{uid}.eml"), raw_email)
    end
end

imap search syntax with multiple OR arguments

OR takes two arguments, neither more nor less. OR a b works, (OR a b) works but(OR a) won't work. That would be a single-argument OR inside a single-argument AND. The parser is looking for the second argument to OR when it runs up against the ) that ends the list of arguments to AND. The last part of your query is (OR (FROM a3@b.com SINCE 19-Oct-2018)).

What you mean is probably OR (FROM a1@b.com SINCE 1-Oct-2018) OR (FROM a2@b.com SINCE 10-Oct-2018) (FROM a3@b.com SINCE 19-Oct-2018). In that expression, the first OR takes two arguments, which are an AND and another OR, and the second OR takes two arguments, both of which are ANDs.

(I agree that this difference between OR and AND is a bit strange.)

Ruby Imap "Changes" Since Last Check