Fun with Ruby: Get All Nancy Drew on Chrome

I use the Chrome history tab when I forget about something I've looked up in the past. I initially thought that the data would be stored in a CSV or XML file and thought I could do some string munging for kicks and giggles. To my delight, when I looked in the "Application Support" directory for Chrome, I found several data-rich sqlite databases ready for mining. With a few Ruby tricks, I found some cool data. All the code this article covers is available on the chrome_spy project.

With chrome_spy, you can answer some of these queries:

  • what have I searched for recently?
  • which sites do I visit most frequently?
  • what urls do I type into the address bar frequently?
  • what have I downloaded and how big are the files?

And that's only the surface. Let's dive straight into the technical details.

Since the data is stored in sqlite, I decided to use trusty old ActiveRecord to wrap around the tables:

I've been using Chrome for a while now, so there were multiple archived 'History' databases (e.g. History Index 2010-07), but the most recent database is named 'History'. Rather than inspecting the schema on each of the tables, ActiveRecord has a module SchemaDumper that generates a very readable schema dump:

From here, it's pretty straightforward to map the tables to ActiveRecord models. For each 'create_table' declaration, I declared a new model. For example, to define a model for urls:

At this point, you should be able to query for Urls through the models:

Everything was looking peachy up until the last_visit_time. At first I thought it was an epoch timestamp, or a JS timestamp. After looking at a few other timestamps, I noticed that it's 17 digits long rather than the usual 10 digits. The frustrating part was that some fields used epoch timestamps, but other fields would use this 17-digit timestamp. I wrote a little helper module to clean up the typecast from these columns:

Then for every table that has timestamp columns, I can declare them in the model. For example:

Let's retry that same query from before:

That's much better.

Some table names and column names don't follow Rails conventions, so there's a little extra work to specify associations and some tables. For example, the SegmentUsage model is backed by the 'segment_usage' table rather than 'segment_usages':

Another example is when Visit uses 'url' as a foreign key to Url rather than 'url_id':

With just these thin wrappers, we can easily come up with queries to answer the questions at the beginning of this article. To find the most recent searches, we can do:

The definition for this method is just a simple ActiveRecord query:

At this point, you should take a break and share your search history with a friend. Context-free search terms can be hilarious and embarrassing. A few my girlfriend could not stop laughing at:

  • super meat boy
  • exercise mix tapes
  • huntsville bed intruder
  • factory girl
  • haml (she thought I repeatedly misspelled 'ham')

To answer the other questions at the beginning of the article:

Which sites do I visit most frequently?

What urls do I type into the address bar frequently?

What have I downloaded and how big are the files?

While ChromeSpy may not be the most useful example, it shows how ActiveRecord can be applied outside of Rails. A similar thought process can be reused for other problems where quick data manipulations or reporting is needed. Whether those reports are useful, or just plain silly is entirely up to you. Now go forth and find some funny recent searches you've done!