Smart Timestamps with MongoDB

I really like using MongoDB and Mongoid, but a while back I ran into some shortcomings with querying timestamps. The problem was that I wanted to query only part of a timestamp, such as the day, week or year. So for example, let's say we need to find all users that signed up on a Wednesday.

In SQL there are date functions that let you to parse dates inside your query (although they seem to vary between engines). So in Postgres, you could do something like this:

select * from users where extract(dow from created_at) = 3; 

Note: Wednesday is the 3rd day of the week.

But MongoDB doesn’t have any native support for parsing a date/time inside the query. The best you can do is compare ranges, like this example using Mongoid:

User.where(:created_at.gte => "2012-05-30", :created_at.lt => "2012-05-31") 

Great, that finds us all users created last Wednesday. But what about all users created on any Wednesday, say in 2012? That would typically require building a query with different ranges for every Wednesday in 2012. Talk about tedious and repetitive. I think it’s safe to say that when faced with such a task most developers will end up just looping over each user, comparing the dates in Ruby.

User.scoped.select { |u| u.created_at.wday == 3 && u.created_at.year == 2012 } 

Eeek! This might work with small collections, but once you have a bunch of users it’s sub-optimal.

So I know I just said there were no native date functions in Mongo. But recently I was excited to find a solution that kind of works. It turns out that date/time types in Mongo get stored as UTC datetimes, which are basically just javascript dates stored in BSON. So it’s possible to drop down into javascript in your query using $where. With Mongoid it might look something like this:

User.where("return this.created_at.getDay() == 2 && this.created_at.getFullYear() == 2012") 

Note: in Javascript the day of week starts with 0 instead of 1. So Wednesdays are 2.

Now things seem to be looking up for us. But alas, the MongoDB documentation for $where warns of major performance issues. This makes sense because what’s really happening here is each user record is still getting accessed and each date is still getting parsed with javascript. Furthermore, we can’t index our search. So this solution is probably only marginally better than looping over each record in Ruby.

What I really wanted was a way to query by just day of week, or month, or hour, minute, second, etc. And I decided the best way to accomplish that would be to parse each timestamp before it gets saved, and then store all the additional timestamp metadata along with the record. That way I could query timestamp parts just like any other field, with no parsing. And as an added bonus, it should be even faster than using the native date functions with SQL!

So I started thinking of all the fields I would want to store, and I came up with the following list:

  • year
  • month
  • day
  • wday
  • hour
  • min
  • sec
  • zone
  • offset

But that’s a lot of fields cluttering up our model, especially if we’re storing two different timestamps like a created_at and updated_at. Well fortunately this is one area where MongoDB really shines. We can simply nest all this metadata under each timestamp field as BSON. And since we’re using Mongoid, we can also override the serialize and deserialize methods to make the interface behave just like a regular time field. So this is where the idea for the mongoid-metastamp gem came from. Here’s a simple usage example:

class MyEvent   include Mongoid::Document   field :timestamp, type: Mongoid::Metastamp::Time end  event = MyEvent.new event.timestamp = "2012-05-30 10:00" 

Now, calling a timestamp field returns a regular time:

event.timestamp => Wed, 30 May 2012 10:00:00 UTC +00:00 

But you can also access all the other timestamp metadata like this:

event['timestamp'] => {"time"=>2012-05-30 10:00:00 UTC, "year"=>2012, "month"=>5, "day"=>30, "wday"=>3, "hour"=>10, "min"=>0, "sec"=>0, "zone"=>"UTC", "offset"=>0} 

Now at last, we can performantly search for all Wednesday events in 2012:

hump_days = MyEvent.where("timestamp.wday" => 3, "timestamp.year" => 2012) 

If you were paying close attention you may have also noticed that zone is included in the metadata. That's because Mongoid Metastamp has some powerful features that allow you to store and query timestamps relative to the local time they were created in. But I’ll have to write more about that in a follow up post.