Built For Speed: Prevent Amazon CloudFront From Serving Stale Assets

In the last Built For Speed post, I demonstrated how you can use the Amazon CloudFront Content Delivery Network (CDN) for images on your site. However, ideally we should be using CloudFront for all static assets, not just images.

Before we jump into that, though, let’s do a quick review of how CloudFront works. Like all CDNs, CloudFront consists of a number of edge servers all around the world, each of which has a connection back to a central asset server. When CloudFront receives a request for an asset, it calculates which edge server is geographically closest to the request location. For example, a user in England may request the asset ‘dog.jpg’. CloudFront will route that request to the London server, which will check if it has a cached version of ‘dog.jpg’. If it does, the edge server will return that cached version. If not, it will retrieve the image from the central asset, cache it locally and return it to the user. All subsequent requests in England for ‘dog.jpg’ will get the cached version on the London edge server. This approach minimizes network latency.

There is one big gotcha with this approach: If the ‘dog.jpg’ image changes from a poodle to a beagle, but keeps the same name, the edge server will keep serving the poodle image (assuming the expires headers are set far in the future as they should be). The edge server will not pick up the latest asset unless the name of the asset changes.

Okay, with that background out of the way, let’s take a look at how we can get our CSS and JavaScripts served through CloudFront. The approach I’ve taken is to create an initializer file that sets a REVISION constant. This could easily be created as part of a deployment process, copying the latest Git or Subversion revision into the initializer file, but for now I just created it manually. We’ll append the REVISION constant to the names of your packaged CSS and JavaScripts, so that on each deploy, the files have a different name, thereby preventing CloudFront from serving stale assets.

I have also moved the S3 configuration parsing out of the Post model and into another initializer, which sets the S3_CONFIG hash constant. In addition, I added the bucket name to my amazon_s3.yml config file. (Remember, if you have any questions, you can always refer to the source code.)

 # /config/initializers/s3_config.rb S3_CONFIG = YAML.load_file("#{RAILS_ROOT}/config/amazon_s3.yml")[RAILS_ENV] 

See below for the Rake task I wrote to copy the packaged files to S3. Note that this should be run after you run the ‘rake assets:packager:build_all’ task from AssetPackager (see the first Built For Speed post).

 require 'right_aws'  namespace :s3 do   namespace :assets do     desc "Upload static assets to S3"     task :upload => :environment do       s3 = RightAws::S3.new(         S3_CONFIG['access_key_id'],          S3_CONFIG['secret_access_key']       )       bucket = s3.bucket(S3_CONFIG['bucket'], true, 'public-read')        files = Dir.glob(File.join(RAILS_ROOT, "public/**/*_packaged.{css,js}"))        files.each do |file|         filekey = file.gsub(/.*public//, "").gsub(/_packaged/, "_packaged_#{REVISION}")         key = bucket.key(filekey)         begin           File.open(file) do |f|             key.data = f             key.put(nil, 'public-read', {'Expires' => 1.year.from_now})           end         rescue RightAws::AwsError => e           puts "Couldn't save #{key}"           puts e.message           puts e.backtrace.join("n")         end       end     end   end end 

Again, ideally this should be part of the deployment process – first, run the AssetPackager task to create the packaged asssets, then run the S3 upload task to store them on S3. Notice that I’m appending the REVISION string to the end of file names for each of the packaged CSS and JavaScript files before uploading to S3. Also notice that I’m setting the Expires header to one year from now.

Hmm, we may have a couple problems here. First, by default, Rails expects CSS and JavaScript files to be in their proper places in the /public directory at the root of the application. That’s easily fixed by adding the following line to the bottom of /config/environments/production.rb:

 ActionController::Base.asset_host = Proc.new { CLOUDFRONT_DISTRIBUTION } 

The second problem is that the helpers provided by the AssetPackager plugin (‘stylesheet_link_merged’ and ‘javascript_include_merged’) don’t know that you’ve added a revision number to the end of the filenames. Not to worry – we just need to update a couple lines in /vendor/plugins/asset_packager/lib/synthesis/asset_package.rb. Update the ‘current_file’ method to look like this:

 def current_file   build unless package_exists?    path = @target_dir.gsub(/^(.+)$/, '1/')   name = "#{path}#{@target}_packaged"   name += "_#{REVISION}" if defined? REVISION end 

Try making those updates, then running ‘rake s3:assets:upload RAILS_ENV=production" (remember we’re running in production mode for all the Built For Speed examples). After restarting your application, inspect the source and you should see that your stylesheets and scripts are being served by CloudFront, with the revision number at the end of the file names.

Now let’s return to our images. After the last post, we already have them delivered by CloudFront. The problem is, if you decide to update your image, Paperclip will give it the same style names as before (‘original’, ‘large’, ‘medium’, ‘thumb’). Uh-oh. Because the files have the same names, the CloudFront edge servers won’t update from the central asset server to use the latest image, and your users will continue to see the stale, old image.

Go ahead and give it a try by updating an image for an existing post. Whoops! The old image is still displayed.

Here’s how we solve that problem. First of all, let’s update the Post model to use the new S3_CONFIG constant. While we’re at it, let’s add a timestamp to our image path so that each time you update the image, it will have a different name.

 # in /app/models/post.rb has_attached_file :image,                  :styles => {:large => "500x500", :medium => "250x250", :thumb => "100x100"},                  :storage => 's3',                  :s3_credentials => S3_CONFIG,                  :bucket => S3_CONFIG['bucket'],                  :path => ":class/:id/:style_:timestamp.:extension" 

One small issue: For some reason the Paperclip plugin uses a string representation for its ‘timestamp’ method, so you end up with values like “2009-06-26 15:25:44 UTC”. This isn’t very practical for timestamping file names, so I’ve changed it:

 # /vendor/plugins/paperclip/lib/paperclip/interpolations.rb def timestamp attachment, style   attachment.instance_read(:updated_at).to_i end 

With that change, now each time we store an attached image, it will have a timestamp affixed to the end of the filename. Thus, the CloudFront edge server will go back to the central asset server and retrieve the new image rather than serving up the old image. Don’t worry – for each subsequent request, you’ll get the benefit of having the new image on the edge server.

Restart your application, and try updating an image again. This time around, you’ll see the image is updated correctly.

That wraps it up for our CloudFront review. Now go forth and speed up your sites!

Update

I realized I promised in my last post to show you how to make YSlow recognize that you were now using a CDN. Here’s what you do:

  1. Go to “about:config” in Firefox
  2. Right-click on the page, and select “New” > “String”
  3. Enter “extensions.yslow.cdnHostnames” as the preference name
  4. Enter “cloudfront.net” as your CDN host name
  5. Restart Firefox and run YSlow on your application again – you should now see that you get an “A” for using a CDN

RESOURCES

Built For Speed source code