R – IMAP folder/messages synchronization strategy


I'm about to add IMAP email integration to one of our web applications (ASP.NET / SQL Server). I'm already using a commercial library which exposes the most important IMAP functionality: get folder list, get message headers, get mime message etc.)

Getting email data "live" from the IMAP server works very well. But here comes the difficult task: I have to keep the email/folders caching SQL database synchronized to the IMAP server (I have to show data applying different criteria).

Our database schema essentially contains a "Folders" and an "Emails" table. The "Emails" table contains primarily header information like "FromAddress", "FromName", "IsRead", "IsAnswered", "IsForwarded", "HasAttachments" etc. (without the email content or attachments).

I have to consider two major scenarios:

  1. Getting all messages the first time (or after a user re-organized the folders)
  2. Getting new/recent messages

What would be a good synchronization strategy for keeping the mail server and database server up-to-date, considering that performance is a major design criterion (I can't just query/compare thousands of messages every time I connect, in order to find out if the user moved or deleted some old emails).


Best Solution

From your library's feature list:

Better UniqueId Support: We've added even more options for requesting a message's unique id. You can now return the UniqueId in a message's DataTable for return trips to the IMAP server.


  • Retrieve only New Messages
  • Search Flagged Messages
  • Mark/Unmark Messages as Read

It looks to me as though your library has all the support you need to keep your SQL server synchronized. You can programmatically mark messages as read, and the library supports retrieval of only new messages. That takes care of your second item.

Your strategy will depend partly on how your solution works. If I read your question correclty, your users manage their email on the IMAP server, and your SQL Server is "subscribed" to the IMAP server, from a syncronization perspective.

If this is correct, then synchronization is effectively a background task. My approach would be to synchronize using an event model on a user-by-user basis. If possible, "notify" the synchronization program when there is activity (new/deleted emails) for a user. Add a synchronization "job" to a background process that batches synch jobs together. A notification model will ensure that the synch program only works on users that need a synch.

Small new/deleted email synch jobs go to one "processor" and larger jobs like total resynch and folder reorganization go to another. Really big resynch jobs may have to be split up in order to keep overall throughput high. The "small job" and "big job" processors could be two different services, or possibly two different threads depending on performance and design considerations.

Related Question