I couldn’t find all that much information about IMAP on the web, other than the RFC3501.
The IMAP protocol document is absoutely key to understanding the commands available, but let me skip attempting to explain and just lead by example where I can point out the common gotchas I ran into.
Logging in to the inbox
import imaplib
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('myusername@gmail.com', 'mypassword')
mail.list()
# Out: list of "folders" aka labels in gmail.
mail.select("inbox") # connect to inbox.
Getting all mail and fetching the latest
Let’s start by searching our inbox for all mail with the search function.
Use the built in keyword “ALL” to get all results (documented in RFC3501).
We’re going to extract the data we need from the response, then fetch the mail via the ID we just received.
result, data = mail.search(None, "ALL") ids = data[0] # data is a list. id_list = ids.split() # ids is a space separated string latest_email_id = id_list[-1] # get the latest result, data = mail.fetch(latest_email_id, "(RFC822)") # fetch the email body (RFC822) for the given ID raw_email = data[0][1] # here's the body, which is raw text of the whole email # including headers and alternate payloads
Using UIDs instead of volatile sequential ids
The imap search function returns a sequential id, meaning id 5 is the 5th email in your inbox.
That means if a user deletes email 10, all emails above email 10 are now pointing to the wrong email.
This is unacceptable.
Luckily we can ask the imap server to return a UID (unique id) instead.
The way this works is pretty simple: use the uid function, and pass in the string of the command in as the first argument. The rest behaves exactly the same.
result, data = mail.uid('search', None, "ALL") # search and return uids instead
latest_email_uid = data[0].split()[-1]
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = data[0][1]
Parsing Raw Emails
Emails pretty much look like gibberish. Luckily we have a python library for dealing with emails called… email.
It can convert raw emails into the familiar EmailMessage object.
import email
email_message = email.message_from_string(raw_email)
print email_message['To']
print email.utils.parseaddr(email_message['From']) # for parsing "Yuji Tomita" <yuji@grovemade.com>
print email_message.items() # print all headers
# note that if you want to get text content (body) and the email contains
# multiple payloads (plaintext/ html), you must parse each message separately.
# use something like the following: (taken from a stackoverflow post)
def get_first_text_block(self, email_message_instance):
maintype = email_message_instance.get_content_maintype()
if maintype == 'multipart':
for part in email_message_instance.get_payload():
if part.get_content_maintype() == 'text':
return part.get_payload()
elif maintype == 'text':
return email_message_instance.get_payload()
Advanced searches
We’ve only done the basic search for “ALL”.
Let’s try something else such as a combination of searches we want and don’t want.
All available search parameters are listed in the IMAP protocol documentation and you will definitely want to check out the SEARCH Command reference.
Here are just a few searches to get you started.
Search any header
For searching any headers, such as the subject, Reply-To, Received, etc., the command is simply “(HEADER “”)”
mail.uid('search', None, '(HEADER Subject "My Search Term")')
mail.uid('search', None, '(HEADER Received "localhost")')
Search for emails since in the past day
Often times the inbox is too large and IMAP doesn’t specify a way of limiting results, resulting in extremely slow searches. One way to limit is to use the SENTSINCE keyword.
The SENTSINCE date format is DD-Jun-YYYY. In python, that would be strftime(‘%d-%b-%Y’).
import datetime
date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y")
result, data = mail.uid('search', None, '(SENTSINCE {date})'.format(date=date))
Limit by date, search for a subject, and exclude a sender
date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y")
result, data = mail.uid('search', None, '(SENTSINCE {date} HEADER Subject "My Subject" NOT FROM "yuji@grovemade.com")'.format(date=date))
Fetches
Get Gmail thread ID
Fetches can include the entire email body, or any combination of results such as email flags (seen/unseen) or gmail specific IDs such as thread ids.
result, data = mail.uid('fetch', uid, '(X-GM-THRID X-GM-MSGID)')
Get a header key only
result, data = mail.uid('fetch', uid, '(BODY[HEADER.FIELDS (DATE SUBJECT)]])')
Fetch multiple
You can fetch multiple emails at once. I found through experimentation that it’s expecting comma delimited input.
result, data = mail.uid('fetch', '1938,2398,2487', '(X-GM-THRID X-GM-MSGID)')
Use a regex to parse fetch results
The returned result isn’t very easy to swallow. They are space separated key-value pairs.
Use a simple regex to get the data you need.
import re
result, data = mail.uid('fetch', uid, '(X-GM-THRID X-GM-MSGID)')
re.search('X-GM-THRID (?P<X-GM-THRID>\d+) X-GM-MSGID (?P<X-GM-MSGID>\d+)', data[0]).groupdict()
# this becomes an organizational lifesaver once you have many results returned.
Conclusion
Well, that should leave you with a much better understanding of the IMAP protocol and using python to interface with Gmail.
Cerntainly more than I knew!
Thanks Yuji
This is soooo helpful!
Hi, is there a way to also use time with date in search?
All references to “time” in RFC 3501 say “disregarding time and timezone” – so I’d say no.
Pingback: Confluence: Sendlabs
Very helpful; thanks lots.
short, concise and very helpful, many thanks.
Is there a way in gmail to know under how many labels (or directories) an email is ?
I would like to know with which labels my emails have been tagged.
Thanks. Very useful
Perfect – Many Thanks! Why can’t everyone write examples as straight forward as this?
Thanks for the detailed examples. Do you happen to have some code that wraps imaplib in a django app and parses fetched results as djang EmailMessage objects? I’m just looking for a starting point, not a polished app. Thanks again!
Now that’s what I call useful stuff!
PS: it would have been nice(r) to have quick example on how to fetch “unread” emails.
Pingback: Hello Python « BITS in bits..
I am trying to use your examples in web2py DAL. Could you give more info on licensing/credits?
I also would appreciate any advice on unicode and cross service syntax issues (since it appears there is no common implementation of commands on different brands and services).
Hey Alan,
Everything I post should be Beerware : )
What’s this about cross service syntax issues? Across different IMAP services?
The IMAP commands detailed in RFC3501 should be compatible across any service that implements IMAP.
I sent a prototype to web2py issues for adding an IMAP adapter.
http://code.google.com/p/web2py/issues/detail?id=610
I was not aware of the comma separated sequence specification in the IMAP RFC and I thought It was a local implemented Gmail sytax, so I was concerned about using the interface with different servers.
Thanks for the feedback
Regards
Ah ha, you’re right! I only see reference to a sequence-set in the format XX:YY. I’m not sure how widely supported / unsupported the comma syntax is. I’d love to hear about it if you find out!
It depends on the acceptance of the enhancement request and the tests of the users with different server brands, but IMAP RFC does specify the syntax, as you mentioned before:
http://tools.ietf.org/html/rfc3501#section-9
sequence-set = (seq-number / seq-range) *(“,” sequence-set)
; set of seq-number values, regardless of order.
; Servers MAY coalesce overlaps and/or execute the
; sequence in any order.
; Example: a message sequence number set of
; 2,4:7,9,12:* for a mailbox with 15 messages is
; equivalent to 2,4,5,6,7,9,12,13,14,15
; Example: a message sequence number set of *:4,5:7
; for a mailbox with 10 messages is equivalent to
; 10,9,8,7,6,5,4,5,6,7 and MAY be reordered and
; overlap coalesced to be 4,5,6,7,8,9,10.
Thanks again
Pingback: Download/Extract e-mails as text? - Quora
wow, awesome, ive been trying to figure this stuff out for days!!! (complete newbie)… thanks again Yuji, beautiful work!… just out of curiosity, do you know of a reference spot for info on python interacting with google voice (for sms reasons)? thanks!
Hi Yuji,
thanks for this article, i have some doubts i would be thankful if you could help me
IMAP Search Keys are as follows:
BEFORE
Messages whose internal date (disregarding time and timezone)
is earlier than the specified date.
ON
Messages whose internal date (disregarding time and timezone)
is within the specified date.
SENTBEFORE
Messages whose [RFC-2822] Date: header (disregarding time and
timezone) is earlier than the specified date.
SENTON
Messages whose [RFC-2822] Date: header (disregarding time and
timezone) is within the specified date.
SENTSINCE
Messages whose [RFC-2822] Date: header (disregarding time and
timezone) is within or later than the specified date.
SINCE
Messages whose internal date (disregarding time and timezone)
is within or later than the specified date.
in the above they are saying about “internal date” what it is?
becoz i did not find any header in the original mail with this name
is internal date different from Date: header?
can you say if i you SENTON which header does it use?
Received: by 10.112.63.19 with SMTP id c19csp82292lbs;
Tue, 21 Feb 2012 22:30:44 -0800 (PST)
Date: Wed, 22 Feb 2012 12:00:42 +0530
date = ‘”22 Feb 2012″‘
when i search for the above like this mail.search(None, ‘SENTON’, date)
it does gives empty result. do you have any idea?
hi..
when i’m executing the first few lines i.e.
******************************************
1
import imaplib
2
mail = imaplib.IMAP4_SSL(‘imap.gmail.com’)
3
mail.login(‘myusername@gmail.com’, ‘mypassword’)
4
mail.list()
5
# Out: list of “folders” aka labels in gmail.
6
mail.select(“inbox”) # connect to inbox.
**************************************************
following error is coming..
++++++++++++++++++++++++++++++++++++++
python sample2.py
Traceback (most recent call last):
File “sample2.py”, line 2, in
mail = imaplib.IMAP4_SSL(‘imap.gmail.com’)
File “/usr/lib/python2.6/imaplib.py”, line 1138, in __init__
IMAP4.__init__(self, host, port)
File “/usr/lib/python2.6/imaplib.py”, line 163, in __init__
self.open(host, port)
File “/usr/lib/python2.6/imaplib.py”, line 1149, in open
self.sock = socket.create_connection((host, port))
File “/usr/lib/python2.6/socket.py”, line 547, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known
++++++++++++++++++++++++++++++++
please please reply and tell me what the problem is.. thanks in advance…
Sounds like your system can’t resolve domains.. Can you even `ping google.com` from a terminal?
thnxxxx fr da code..it was helpful..!
bt if m nt wrong, dis only extracts da latest email..is der any way 2 extract all unread msgs..??
der wer sum network-related issues….bt da codez workin nw..!
tthnxxxx fr da code..it was helpful..!
bt if m nt wrong, dis only extracts da latest email..is der any way 2 extract all unread msgs..??
Thanks . useful code
GET A HEADER KEY ONLY has extra Square Bracket in It ..
Pingback: How to Extract Email (GMail) contents as text using imaplib via IMAP in Python 3.2.3 | BOTS World
Reblogged this on ♫ Gulzar Manzil ♪.
Awesome work! really helpful..thanks yunji
Pingback: Access gmail using python and imaplib with OAuth2 | Technology,Travel and FoodTechnology,Travel and Food
I have made a post about using imaplib with oauth2..if anyone is interested its available in http://rakeshmukundan.in/2013/01/23/access-gmail-using-imaplib-and-python-with-oauth2/
Pingback: Gmail Imap access with python [closed] | appsgoogleplus.com
This is seriously cool !!
Helped me move away the tedious java coding involved.
I’m new to python.when I copied and run your code ( changing user,password), I got a restart in the python shell but not results. I was able to ping gmail so i don’t think it’s a network issue. Please advise.
Thanks
You’ll have to read the error and follow through on it. Typical programming debugging applies… Python tends to spit a traceback.
Open the Debug control on the Python shell. I got the following message;
‘dbd’.run(),line392:exec(cmd,globals,locals)
‘_main_’.(),line 1:Import impalib
> ‘imaplib’.(),line11:”"
Under Globals Section
_builtins_
_doc_ None
_ Name_ ‘_main_’
_Package_None
My Comment: What is it missing? If something is missing how can a load into python.
Thank you so much for your assistance.
Thank you for this excellent article. It was exactly what I was looking for. Nicely done.
Same here! Tried this out in the REPL and it went off without a hitch!
Most of writers have ability of changing their writing skill in order to gain new clientele and contracts.
If people are motivated enough to join a
forum or blog which is regularly updated and is reasonably popular.
When you use Internet Marketing Winnipeg, you are being
linked to a keyword phrase and giving a score, with each link counting as a” triple”.
All these could be efficiently done by an search engine optimization canada company can do the same for you
as well.
Excellent little tutorial. Thank you!
Pingback: 最好用的Gmail库 - python - 开发者问答
Pingback: 最好用的Gmail库 - python - 开发者问答