Django Storages, Boto, and Amazon S3 Slowness on collectstatic command fixed

I’ve had a few issues moving everything to Amazon S3.

First, django’s collectstatic management command wasn’t even detecting modified files, so the command would re-upload all files to amazon every invocation.

Django-Storages v1.1.3 fixed this problem, but now I noticed a new problem: modified files were taking less time to detect, but still far too long given that one call was returning the meta data Amazon S3.

After some digging, I found the problem in the modified_time method where the fallback value is being called even if it’s not being used. I moved the fallback to an if block to be executed only if get returns None

entry = self.entries.get(name, self.bucket.get_key(self._encode_name(name))) 
# notice the function being called to populate the default value, regardless
# of whether or not a default is required.

That code should be wrapped in an if statement and only fire the expensive function if the get function fails.

    entry = self.entries.get(name)
    if entry is None:
        entry = self.bucket.get_key(self._encode_name(name))

This change spurred me to create a bitbucket account and learn some basic Mercurial commands : )

Hope it helps somebody else out there with many more than 1k files to sync.

The results

I benchmarked the difference on my local machine between the two functions.

For 1000 files, the new version took less than .1s vs the old function which took 11.5 seconds.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s