I couldn’t find all that much information about IMAP on the web, other than the RFC3501.
The IMAP protocol document is absoutely key to understanding the commands available, but let me skip attempting to explain and just lead by example where I can point out the common gotchas I ran into.
Logging in to the inbox
import imaplib mail = imaplib.IMAP4_SSL('imap.gmail.com') mail.login('myusername@gmail.com', 'mypassword') mail.list() # Out: list of "folders" aka labels in gmail. mail.select("inbox") # connect to inbox.
Getting all mail and fetching the latest
Let’s start by searching our inbox for all mail with the search function.
Use the built in keyword “ALL” to get all results (documented in RFC3501).
We’re going to extract the data we need from the response, then fetch the mail via the ID we just received.
result, data = mail.search(None, "ALL") ids = data[0] # data is a list. id_list = ids.split() # ids is a space separated string latest_email_id = id_list[-1] # get the latest result, data = mail.fetch(latest_email_id, "(RFC822)") # fetch the email body (RFC822) for the given ID raw_email = data[0][1] # here's the body, which is raw text of the whole email # including headers and alternate payloads
Using UIDs instead of volatile sequential ids
The imap search function returns a sequential id, meaning id 5 is the 5th email in your inbox.
That means if a user deletes email 10, all emails above email 10 are now pointing to the wrong email.
This is unacceptable.
Luckily we can ask the imap server to return a UID (unique id) instead.
The way this works is pretty simple: use the uid function, and pass in the string of the command in as the first argument. The rest behaves exactly the same.
result, data = mail.uid('search', None, "ALL") # search and return uids instead latest_email_uid = data[0].split()[-1] result, data = mail.uid('fetch', latest_email_uid, '(RFC822)') raw_email = data[0][1]
Parsing Raw Emails
Emails pretty much look like gibberish. Luckily we have a python library for dealing with emails called… email.
It can convert raw emails into the familiar EmailMessage object.
import email email_message = email.message_from_string(raw_email) print email_message['To'] print email.utils.parseaddr(email_message['From']) # for parsing "Yuji Tomita" <yuji@grovemade.com> print email_message.items() # print all headers # note that if you want to get text content (body) and the email contains # multiple payloads (plaintext/ html), you must parse each message separately. # use something like the following: (taken from a stackoverflow post) def get_first_text_block(self, email_message_instance): maintype = email_message_instance.get_content_maintype() if maintype == 'multipart': for part in email_message_instance.get_payload(): if part.get_content_maintype() == 'text': return part.get_payload() elif maintype == 'text': return email_message_instance.get_payload()
Advanced searches
We’ve only done the basic search for “ALL”.
Let’s try something else such as a combination of searches we want and don’t want.
All available search parameters are listed in the IMAP protocol documentation and you will definitely want to check out the SEARCH Command reference.
Here are just a few searches to get you started.
Search any header
For searching any headers, such as the subject, Reply-To, Received, etc., the command is simply “(HEADER “”)”
mail.uid('search', None, '(HEADER Subject "My Search Term")') mail.uid('search', None, '(HEADER Received "localhost")')
Search for emails since in the past day
Often times the inbox is too large and IMAP doesn’t specify a way of limiting results, resulting in extremely slow searches. One way to limit is to use the SENTSINCE keyword.
The SENTSINCE date format is DD-Jun-YYYY. In python, that would be strftime(‘%d-%b-%Y’).
import datetime date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y") result, data = mail.uid('search', None, '(SENTSINCE {date})'.format(date=date))
Limit by date, search for a subject, and exclude a sender
date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y") result, data = mail.uid('search', None, '(SENTSINCE {date} HEADER Subject "My Subject" NOT FROM "yuji@grovemade.com")'.format(date=date))
Fetches
Get Gmail thread ID
Fetches can include the entire email body, or any combination of results such as email flags (seen/unseen) or gmail specific IDs such as thread ids.
result, data = mail.uid('fetch', uid, '(X-GM-THRID X-GM-MSGID)')
Get a header key only
result, data = mail.uid('fetch', uid, '(BODY[HEADER.FIELDS (DATE SUBJECT)]])')
Fetch multiple
You can fetch multiple emails at once. I found through experimentation that it’s expecting comma delimited input.
result, data = mail.uid('fetch', '1938,2398,2487', '(X-GM-THRID X-GM-MSGID)')
Use a regex to parse fetch results
The returned result isn’t very easy to swallow. They are space separated key-value pairs.
Use a simple regex to get the data you need.
import re result, data = mail.uid('fetch', uid, '(X-GM-THRID X-GM-MSGID)') re.search('X-GM-THRID (?P<X-GM-THRID>\d+) X-GM-MSGID (?P<X-GM-MSGID>\d+)', data[0]).groupdict() # this becomes an organizational lifesaver once you have many results returned.
Conclusion
Well, that should leave you with a much better understanding of the IMAP protocol and using python to interface with Gmail.
Cerntainly more than I knew!