Python — imaplib IMAP example with Gmail

I couldn’t find all that much information about IMAP on the web, other than the RFC3501.

The IMAP protocol document is absoutely key to understanding the commands available, but let me skip attempting to explain and just lead by example where I can point out the common gotchas I ran into.

Logging in to the inbox

import imaplib
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('myusername@gmail.com', 'mypassword')
mail.list()
# Out: list of "folders" aka labels in gmail.
mail.select("inbox") # connect to inbox.

Getting all mail and fetching the latest

Let’s start by searching our inbox for all mail with the search function.
Use the built in keyword “ALL” to get all results (documented in RFC3501).

We’re going to extract the data we need from the response, then fetch the mail via the ID we just received.

result, data = mail.search(None, "ALL")

ids = data[0] # data is a list.
id_list = ids.split() # ids is a space separated string
latest_email_id = id_list[-1] # get the latest

result, data = mail.fetch(latest_email_id, "(RFC822)") # fetch the email body (RFC822) for the given ID

raw_email = data[0][1] # here's the body, which is raw text of the whole email
# including headers and alternate payloads

Using UIDs instead of volatile sequential ids

The imap search function returns a sequential id, meaning id 5 is the 5th email in your inbox.
That means if a user deletes email 10, all emails above email 10 are now pointing to the wrong email.

This is unacceptable.

Luckily we can ask the imap server to return a UID (unique id) instead.

The way this works is pretty simple: use the uid function, and pass in the string of the command in as the first argument. The rest behaves exactly the same.

result, data = mail.uid('search', None, "ALL") # search and return uids instead
latest_email_uid = data[0].split()[-1]
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = data[0][1]

Parsing Raw Emails

Emails pretty much look like gibberish. Luckily we have a python library for dealing with emails called… email.

It can convert raw emails into the familiar EmailMessage object.

import email
email_message = email.message_from_string(raw_email)

print email_message['To']

print email.utils.parseaddr(email_message['From']) # for parsing "Yuji Tomita" <yuji@grovemade.com>

print email_message.items() # print all headers

# note that if you want to get text content (body) and the email contains
# multiple payloads (plaintext/ html), you must parse each message separately.
# use something like the following: (taken from a stackoverflow post)
def get_first_text_block(self, email_message_instance):
    maintype = email_message_instance.get_content_maintype()
    if maintype == 'multipart':
        for part in email_message_instance.get_payload():
            if part.get_content_maintype() == 'text':
                return part.get_payload()
    elif maintype == 'text':
        return email_message_instance.get_payload()

Advanced searches

We’ve only done the basic search for “ALL”.

Let’s try something else such as a combination of searches we want and don’t want.

All available search parameters are listed in the IMAP protocol documentation and you will definitely want to check out the SEARCH Command reference.

Here are just a few searches to get you started.

Search any header

For searching any headers, such as the subject, Reply-To, Received, etc., the command is simply “(HEADER “”)”

mail.uid('search', None, '(HEADER Subject "My Search Term")')
mail.uid('search', None, '(HEADER Received "localhost")')

Search for emails since in the past day

Often times the inbox is too large and IMAP doesn’t specify a way of limiting results, resulting in extremely slow searches. One way to limit is to use the SENTSINCE keyword.

The SENTSINCE date format is DD-Jun-YYYY. In python, that would be strftime(‘%d-%b-%Y’).

import datetime
date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y")
result, data = mail.uid('search', None, '(SENTSINCE {date})'.format(date=date))

Limit by date, search for a subject, and exclude a sender

date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y")

result, data = mail.uid('search', None, '(SENTSINCE {date} HEADER Subject "My Subject" NOT FROM "yuji@grovemade.com")'.format(date=date))

Fetches

Get Gmail thread ID

Fetches can include the entire email body, or any combination of results such as email flags (seen/unseen) or gmail specific IDs such as thread ids.

result, data = mail.uid('fetch', uid, '(X-GM-THRID X-GM-MSGID)')

Get a header key only

result, data = mail.uid('fetch', uid, '(BODY[HEADER.FIELDS (DATE SUBJECT)]])')

Fetch multiple

You can fetch multiple emails at once. I found through experimentation that it’s expecting comma delimited input.

result, data = mail.uid('fetch', '1938,2398,2487', '(X-GM-THRID X-GM-MSGID)')

Use a regex to parse fetch results

The returned result isn’t very easy to swallow. They are space separated key-value pairs.

Use a simple regex to get the data you need.

import re

result, data = mail.uid('fetch', uid, '(X-GM-THRID X-GM-MSGID)')
re.search('X-GM-THRID (?P<X-GM-THRID>\d+) X-GM-MSGID (?P<X-GM-MSGID>\d+)', data[0]).groupdict()
# this becomes an organizational lifesaver once you have many results returned.

Conclusion

Well, that should leave you with a much better understanding of the IMAP protocol and using python to interface with Gmail.

Cerntainly more than I knew!

About these ads

43 comments on “Python — imaplib IMAP example with Gmail

  1. Pingback: Confluence: Sendlabs

  2. Is there a way in gmail to know under how many labels (or directories) an email is ?
    I would like to know with which labels my emails have been tagged.

    Thanks. Very useful

  3. Pingback: Hello Python « BITS in bits..

  4. I am trying to use your examples in web2py DAL. Could you give more info on licensing/credits?

    I also would appreciate any advice on unicode and cross service syntax issues (since it appears there is no common implementation of commands on different brands and services).

    • Hey Alan,

      Everything I post should be Beerware : )

      What’s this about cross service syntax issues? Across different IMAP services?

      The IMAP commands detailed in RFC3501 should be compatible across any service that implements IMAP.

    • Ah ha, you’re right! I only see reference to a sequence-set in the format XX:YY. I’m not sure how widely supported / unsupported the comma syntax is. I’d love to hear about it if you find out!

      • It depends on the acceptance of the enhancement request and the tests of the users with different server brands, but IMAP RFC does specify the syntax, as you mentioned before:

        http://tools.ietf.org/html/rfc3501#section-9

        sequence-set = (seq-number / seq-range) *(“,” sequence-set)
        ; set of seq-number values, regardless of order.
        ; Servers MAY coalesce overlaps and/or execute the
        ; sequence in any order.
        ; Example: a message sequence number set of
        ; 2,4:7,9,12:* for a mailbox with 15 messages is
        ; equivalent to 2,4,5,6,7,9,12,13,14,15
        ; Example: a message sequence number set of *:4,5:7
        ; for a mailbox with 10 messages is equivalent to
        ; 10,9,8,7,6,5,4,5,6,7 and MAY be reordered and
        ; overlap coalesced to be 4,5,6,7,8,9,10.

        Thanks again

  5. Pingback: Download/Extract e-mails as text? - Quora

  6. wow, awesome, ive been trying to figure this stuff out for days!!! (complete newbie)… thanks again Yuji, beautiful work!… just out of curiosity, do you know of a reference spot for info on python interacting with google voice (for sms reasons)? thanks!

  7. Hi Yuji,
    thanks for this article, i have some doubts i would be thankful if you could help me
    IMAP Search Keys are as follows:
    BEFORE
    Messages whose internal date (disregarding time and timezone)
    is earlier than the specified date.
    ON
    Messages whose internal date (disregarding time and timezone)
    is within the specified date.
    SENTBEFORE
    Messages whose [RFC-2822] Date: header (disregarding time and
    timezone) is earlier than the specified date.
    SENTON
    Messages whose [RFC-2822] Date: header (disregarding time and
    timezone) is within the specified date.
    SENTSINCE
    Messages whose [RFC-2822] Date: header (disregarding time and
    timezone) is within or later than the specified date.
    SINCE
    Messages whose internal date (disregarding time and timezone)
    is within or later than the specified date.

    in the above they are saying about “internal date” what it is?
    becoz i did not find any header in the original mail with this name
    is internal date different from Date: header?
    can you say if i you SENTON which header does it use?

    Received: by 10.112.63.19 with SMTP id c19csp82292lbs;
    Tue, 21 Feb 2012 22:30:44 -0800 (PST)
    Date: Wed, 22 Feb 2012 12:00:42 +0530
    date = ‘”22 Feb 2012″‘
    when i search for the above like this mail.search(None, ‘SENTON’, date)
    it does gives empty result. do you have any idea?

  8. hi..
    when i’m executing the first few lines i.e.

    ******************************************
    1
    import imaplib
    2
    mail = imaplib.IMAP4_SSL(‘imap.gmail.com’)
    3
    mail.login(‘myusername@gmail.com’, ‘mypassword’)
    4
    mail.list()
    5
    # Out: list of “folders” aka labels in gmail.
    6
    mail.select(“inbox”) # connect to inbox.
    **************************************************

    following error is coming..

    ++++++++++++++++++++++++++++++++++++++

    python sample2.py
    Traceback (most recent call last):
    File “sample2.py”, line 2, in
    mail = imaplib.IMAP4_SSL(‘imap.gmail.com’)
    File “/usr/lib/python2.6/imaplib.py”, line 1138, in __init__
    IMAP4.__init__(self, host, port)
    File “/usr/lib/python2.6/imaplib.py”, line 163, in __init__
    self.open(host, port)
    File “/usr/lib/python2.6/imaplib.py”, line 1149, in open
    self.sock = socket.create_connection((host, port))
    File “/usr/lib/python2.6/socket.py”, line 547, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
    socket.gaierror: [Errno -2] Name or service not known
    ++++++++++++++++++++++++++++++++

    please please reply and tell me what the problem is.. thanks in advance…

  9. thnxxxx fr da code..it was helpful..!
    bt if m nt wrong, dis only extracts da latest email..is der any way 2 extract all unread msgs..??

  10. tthnxxxx fr da code..it was helpful..!
    bt if m nt wrong, dis only extracts da latest email..is der any way 2 extract all unread msgs..??

  11. Pingback: How to Extract Email (GMail) contents as text using imaplib via IMAP in Python 3.2.3 | BOTS World

  12. Pingback: Access gmail using python and imaplib with OAuth2 | Technology,Travel and FoodTechnology,Travel and Food

  13. Pingback: Gmail Imap access with python [closed] | appsgoogleplus.com

  14. I’m new to python.when I copied and run your code ( changing user,password), I got a restart in the python shell but not results. I was able to ping gmail so i don’t think it’s a network issue. Please advise.
    Thanks

      • Open the Debug control on the Python shell. I got the following message;
        ‘dbd’.run(),line392:exec(cmd,globals,locals)
        ‘_main_’.(),line 1:Import impalib
        > ‘imaplib’.(),line11:”"

        Under Globals Section
        _builtins_
        _doc_ None
        _ Name_ ‘_main_’
        _Package_None

        My Comment: What is it missing? If something is missing how can a load into python.
        Thank you so much for your assistance.

  15. Most of writers have ability of changing their writing skill in order to gain new clientele and contracts.
    If people are motivated enough to join a
    forum or blog which is regularly updated and is reasonably popular.
    When you use Internet Marketing Winnipeg, you are being
    linked to a keyword phrase and giving a score, with each link counting as a” triple”.
    All these could be efficiently done by an search engine optimization canada company can do the same for you
    as well.

  16. Pingback: 最好用的Gmail库 - python - 开发者问答

  17. Pingback: 最好用的Gmail库 - python - 开发者问答

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s