12 Years of Gmail, Part 1: Google Takeout

Posted on 28 October 2016 in Technology

This post is part of my series, 12 Years of Gmail, taking a look at the data Google has accumulated on me over the past 12 years of using various Google services and documenting the learning experience developing an open source Python project (Takeout Inspector) to analyze that data.

I have been slowly migrating off of a Gmail email address for a couple of months now - I established this domain, selected an email provider, set up SPF, DMARC, etc. and finally created myself a new email address. I updated the address in all of the obvious places, but still found myself using Gmail frequently to keep up. At some point I realized that the only way to finish the migration would be to do something with all the email I had hoarded away in Gmail.

When I made the transition to Gmail (from a mail server in my basement) back in 2004, I found some tool that pulled all my existing email in to Gmail using POP. So, I thought to myself in 2016, I'll just do that again! I fired up Thunderbird, set up Gmail POP access and started downloading. At some point, thousands of emails in, I decided to check just how many emails I had in Gmail - a little over 30,000. Instead of downloading all of that, I went hunting through the emails and deleted stuff I really didn't need (newsletters, test emails, spam, etc.). It was a nice little trip down memory lane! I managed to cut the total in half, but still wanted a better way to get it all downloaded.

Enter Google Takeout. When signed in to Google, this service allows a plethora of data to be exported from an account - mail, contacts, Chrome settings, Hangouts chat history, location history, etc. I ticked the Mail option and was told I would receive an email when my archive was ready for download.

Google Takeout Google Takeout data selection screen.

A few hours later I received the email alert, downloaded the zipped file and uncompressed it to find a 3.5GB mbox mail file. I could import this in to something like Thunderbird (or any other mail client) and have it to refer to, or find some way to import to my new email provider.

However, what ultimately interested me is just how much data is in this little file - 12 years of my communications on the Internet.

I started the account back when I was 20 years old and imported email going back even further in to my teens!

There could be some interesting aggregate information in all this email, such as which emails and domains I communicate with most frequently, when I send and receive emails, what are the most common words I use in communication, etc. etc. I spent a bit of time poking around GitHub looking for projects that may already do this sort of thing, but didn't find anything of particular interest. So I decided this would be a great excuse to dip my toes back in to Python!

I worked through Learn Python the Hard Way a year or so ago and found that I really enjoyed the language. The code structure and syntax is much less cluttered than PHP (where most of my experience is), there is great documentation, lots of easy-to-use packages and it just seemed more fun to write. This experience felt much more like my earlier years learning about HTML and PHP development on DevShed. Ultimately, I didn't have a good project or time to devote to keeping it up so most of that knowledge has slipped away.

As I began working my way around the Mail export, I realized that other elements of Google Takeout - Chrome data, Hangouts and Location History, for example - may also be very interesting to poke around in (and I also greatly regretted my decision to delete half of my email before doing the Mail export!).

Takeout Inspector will be where I capture the code I work on to this end and hopefully I can come up with some fairly interesting information and insights along the way to capture in this series.