Python — imaplib IMAP example with Gmail

Life

I couldn’t find all that much information about IMAP on the web, other than the RFC3501.

The IMAP protocol document is absoutely key to understanding the commands available, but let me skip attempting to explain and just lead by example where I can point out the common gotchas I ran into.

Logging in to the inbox

import imaplib
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('myusername@gmail.com', 'mypassword')
mail.list()
# Out: list of "folders" aka labels in gmail.
mail.select("inbox") # connect to inbox.

Getting all mail and fetching the latest

Let’s start by searching our inbox for all mail with the search function.
Use the built in keyword “ALL” to get all results (documented in RFC3501).

We’re going to extract the data we need from the response, then fetch the mail via the ID we just received.

result, data = mail.search(None, "ALL")

ids = data[0] # data is a list.
id_list = ids.split() # ids is a space separated string
latest_email_id = id_list[-1] # get the latest

result, data = mail.fetch(latest_email_id, "(RFC822)") # fetch the email body (RFC822) for the given ID

raw_email = data[0][1] # here's the body, which is raw text of the whole email
# including headers and alternate payloads

Using UIDs instead of volatile sequential ids

The imap search function returns a sequential id, meaning id 5 is the 5th email in your inbox.
That means if a user deletes email 10, all emails above email 10 are now pointing to the wrong email.

This is unacceptable.

Luckily we can ask the imap server to return a UID (unique id) instead.

The way this works is pretty simple: use the uid function, and pass in the string of the command in as the first argument. The rest behaves exactly the same.

result, data = mail.uid('search', None, "ALL") # search and return uids instead
latest_email_uid = data[0].split()[-1]
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)')
raw_email = data[0][1]

Parsing Raw Emails

Emails pretty much look like gibberish. Luckily we have a python library for dealing with emails called… email.

It can convert raw emails into the familiar EmailMessage object.

import email
email_message = email.message_from_string(raw_email)

print email_message['To']

print email.utils.parseaddr(email_message['From']) # for parsing "Yuji Tomita" <yuji@grovemade.com>

print email_message.items() # print all headers

# note that if you want to get text content (body) and the email contains
# multiple payloads (plaintext/ html), you must parse each message separately.
# use something like the following: (taken from a stackoverflow post)
def get_first_text_block(self, email_message_instance):
    maintype = email_message_instance.get_content_maintype()
    if maintype == 'multipart':
        for part in email_message_instance.get_payload():
            if part.get_content_maintype() == 'text':
                return part.get_payload()
    elif maintype == 'text':
        return email_message_instance.get_payload()

Advanced searches

We’ve only done the basic search for “ALL”.

Let’s try something else such as a combination of searches we want and don’t want.

All available search parameters are listed in the IMAP protocol documentation and you will definitely want to check out the SEARCH Command reference.

Here are just a few searches to get you started.

Search any header

For searching any headers, such as the subject, Reply-To, Received, etc., the command is simply “(HEADER “”)”

mail.uid('search', None, '(HEADER Subject "My Search Term")')
mail.uid('search', None, '(HEADER Received "localhost")')

Search for emails since in the past day

Often times the inbox is too large and IMAP doesn’t specify a way of limiting results, resulting in extremely slow searches. One way to limit is to use the SENTSINCE keyword.

The SENTSINCE date format is DD-Jun-YYYY. In python, that would be strftime(‘%d-%b-%Y’).

import datetime
date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y")
result, data = mail.uid('search', None, '(SENTSINCE {date})'.format(date=date))

Limit by date, search for a subject, and exclude a sender

date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y")

result, data = mail.uid('search', None, '(SENTSINCE {date} HEADER Subject "My Subject" NOT FROM "yuji@grovemade.com")'.format(date=date))

Fetches

Get Gmail thread ID

Fetches can include the entire email body, or any combination of results such as email flags (seen/unseen) or gmail specific IDs such as thread ids.

result, data = mail.uid('fetch', uid, '(X-GM-THRID X-GM-MSGID)')

Get a header key only

result, data = mail.uid('fetch', uid, '(BODY[HEADER.FIELDS (DATE SUBJECT)]])')

Fetch multiple

You can fetch multiple emails at once. I found through experimentation that it’s expecting comma delimited input.

result, data = mail.uid('fetch', '1938,2398,2487', '(X-GM-THRID X-GM-MSGID)')

Use a regex to parse fetch results

The returned result isn’t very easy to swallow. They are space separated key-value pairs.

Use a simple regex to get the data you need.

import re

result, data = mail.uid('fetch', uid, '(X-GM-THRID X-GM-MSGID)')
re.search('X-GM-THRID (?P<X-GM-THRID>\d+) X-GM-MSGID (?P<X-GM-MSGID>\d+)', data[0]).groupdict()
# this becomes an organizational lifesaver once you have many results returned.

Conclusion

Well, that should leave you with a much better understanding of the IMAP protocol and using python to interface with Gmail.

Cerntainly more than I knew!

Python — Basics of Python Dictionary: Looping & Sorting

Python

Here some bits of info about python dictionaries & looping through them.

Extra special beginner stuff.

What is a dictionary?

A python dictionary is an extremely useful data storage construct for storing and retreiving key:value pairs.

Many languages implement simple arrays (lists in python) that are keyed by a integer. For example, if you made a list [1, 2] – the first value would be retrieved via [1, 2][0].

my_list = [1, 2, 3]
my_list[0]
# Out: 1
my_list[1]
# Out: 2

A dictionary is a little more advanced in that the keys can be many other things than integers. Often times the key will be a string merely for the fact that it’s easy for a human to recall.

Will I remember that my_list[3] is my phone number? Not nearly as well as if I have a key named “phone_number”.

my_dict = {
	'key1': 'value1',
	'key2': 'value2',
	'key3': 'value3'
	}
my_dict['key1']
# Out: 'value1'

Major differences vs lists

– Keys are any hashable object (say strings for simplicity)
– Are NOT ordered (a list is by definition ordered)

One way I like to think about them are as little variable containers. The fact that they are wrapped in a container makes them quite useful and versatile since you can easily move the “container” around.

In fact, variables are very much related to dictionaries! Whenever you declare a variable (x=3), that variable is accessible by its string name (‘x’) via the dictionary returned by the builtin function ‘locals()’ or ‘vars()’.

Watch and see:

var1 = 'val1' # local variable definition
var2 = 'val2'

locals_dict = locals()
print locals_dict['var1']
# Out: val1

container = {}
container['var1'] = 'val1'

print container['var1']
# Out: val1

Looping through dictionaries

Now, if you did a little experimenting, you would see that if you loop through a dictionary you loop through its keys.

>>> my_dict = {
...     'key1': 'value1',
...     'key2': 'value2',
...     'key3': 'value3'
...     }
>>> for item in my_dict:
...     print item
... 
key3
key2
key1

Note that this is the equivalent of looping through “my_dict.keys()”

What about getting the values?

Based on what we’ve learned, you could always use the keys you are iterating through to pull the value from the dictionary.

for item in my_dict:
    print my_dict[item]

But there are better ways to get the values. Enter the “items” function.

We can ask the dictionary to return key, value pairs via the “items()” function.

for key, value in my_dict.items(): # returns the dictionary as a list of value pairs -- a tuple.
    print key, value

More efficient dictionary loops

Calling “my_dict.items()” requires generating the entire list of key-value pairs in-memory. Sometimes, if your dictionary is too large, this can be a severe performance bottleneck. To get around this problem we can create a generator via the “iteritems()” method.

A generator allows you to iterate one item at a time. Only the key and value are pulled into memory for every iteration and immediately discarded. There are methods for returning a key generator and value generator as well.

for key, value in my_dict.iteritems():
    print key, value

for key in my_dict.iterkeys():
    print key
    
for value in my_dict.itervalues():
    print value

Note that one thing you can’t do with a generator is to delete a key during the generator loop.

for x, y in mydictionary:
    del mydictionary[x] 
    # Out: RuntimeError: dictionary changed size during iteration

Sorting a Python Dictionary

Actually, python dictionaries can’t be sorted. We will have to turn them into lists to sort them.

    
# to my dictionary...
sorted_list = [x for x in my_dictionary.iteritems()] 

sorted_list.sort(key=lambda x: x[0]) # sort by key
sorted_list.sort(key=lambda x: x[1]) # sort by value

# to reverse the sort
sorted_list.reverse()

Useful dictionary tips

Accessing a dictionary key that might not exist

Sometimes we know that a dictionary key might not exist. This happens a lot in loops where we don’t want to use a try/except block just to capture the exception.

For these situations we have the “get” method. Pass in the key as the first argument, and it will either return the value or None.

Pass in a second argument, and if the key doesn’t exist, it will return the second argument.

dict_ = {'key1':'value1'}
dict_.get('key1')
# out: 'value1'

dict_.get('key2')
# out: None

dict_.get('key2', "Key doesn't exist")
# out: Key doesn't exist

Get or insert into a dictionary if key doesn’t exist

Sometimes we need to insert a value if the key doesn’t exist in a dictionary. The previous “get” method only returns a value – the dictionary is unchanged.

dict_ = {}
dict_.setdefault('key1', 'value1')
# Out: value1

print dict_
# Out: { 'key1': 'value1' } 

This can be extremely useful if you need to append to a list if it exists or otherwise create a blank list.

key_value_pairs = [
    ('key1', 'value'),
    ('key1', 'value2'),
    ('key1', 'value3'),
    ]
dict_ = {}

for key, value in key_value_pairs:
    dict_.setdefault(key, []).append(value)

print dict_
# Out: { 'key1': ['value','value2','value3'] }

Generate a dictionary from tuples

It’s often useful to generate a dictionary from a list of tuples. For example, you could use a list comprehension to create a dictionary!

key_value_pairs = [
    ('key1', 'value'),
    ('key2', 'value2'),
    ('key3', 'value3'),
    ]

# now let's generate the same list of tuples via list comprehension 
key_value_pairs = [('key{0}'.format(x), 'value{0}'.format(x)) for x in range(1, 4)]

dict_ = dict(key_value_pairs)
print dict_
# Out: {'key3': 'value3', 'key2': 'value2', 'key1': 'value1'}

Conclusion

I’m a little surprised that this is one of my most visited blog posts.
It’s not written well, and it was written years ago!

Let me know if I can improve it… 😉

How to use pyExcelerator

Life, Python

I found this blog after hours of searching, and I keep referring back to it. I forgot what it was and started searching for it again and had to use terms like pyExcelerator + blog to find the guide.

Anyways, this is a great help for the not so documented pyExcelerator.

http://ntalikeris.blogspot.com/2007/10/create-excel-file-with-python-my-sort.html

 

Note that what comes after here is mainly for me to read back on in the future. I don’t think it makes a whole lotta sense.

I just created something to handle our orders. Currently we have to manually take orders and type out an excel file to do that.

Its the classic programming idea: Spend an hours to finish something in 5 minutes, instead of spending an hour to do it by hand.

Our platform gives us orders in CSV format. Our logistics people take it in a different format. I don’t want to do each by hand.

I wrote something to take the CSV with data, and then a CSV that has a template.

I used the python csv library to split the CSV into keys (the first row) and filled a dictionary with each key having its corresponding value. I then split the template CSV’s data fields into dict keys containing a pair of numbers (row, column).
Therefore, if the original CSV had a field called “BillAddress1”, I would make a template that has the same field and the script would spit back the coordinates.

That way I can write the data from this list into the right spots in the template.

Template CSV to Keys and Coordinates:
reader = csv.reader(file(filename))
 self.d = {}
  reader_list = []
  for item in reader:
  %nbsp;reader_list.append(item)

  for item in reader_list:
    for subitem in item:
      if subitem:
        self.d[subitem] = (reader_list.index(item), item.index(subitem))
    return self.d

This spits out a dictionary of whatever items are in the CSV that correspond to their positions.
If you filled one field in the entire sheet with “Hello” in position (10, 5), it would return a dictionary with key: “Hello” and value (10, 5)

I used those values to position my data in the actual sheet.
For example, say the real CSV has 3 rows.
| Name | Sex | Age |
| Yuji | M | 21 |
| Bob | M | 50 |

My script converts the CSV into a list, so that it can be iterated over.
Simply declare a new list and loop through the object returned by csv.reader() and append each row to the blank list.

I then make a copy of the first item (the keys), and remove that from the list. List.pop(0)
Then, I convert each item in the list to dictionary keys.

So List[0][‘KEY’] = value

Now I just need to create the writing mechanism.

I convert a CSV template to decide where our data goes.

Say I have this template:
| Name | | |
| Age | | |
| | | Sex |

My script converts those into dict keys w/ location pairs. So dict[‘Name’] = 0,0

Therefore, when I write using pyExcelerator:
self.keys = the dictionary with location pairs (the template)
self.orders = list of dictionaries like so [{a:b,c:d}, {a:b, c:d}]


for order in self.orders:
 sheet = self.workbook.add_sheet("Order")
 for item in self.keys:
  a, b = self.keys[item]
  sheet.write(a, b, item)
  sheet.write(a, b+1, order[item])

Done!