More on ETags

In a previous post, I mentioned how including the ETag and Last-Modified headers in an API can be a real benefit to third-party clients. What I didn’t mention is how ETags can provide a large benefit inside your API as well. This post is about implementing ETags in your application and using them to make your application more efficient.

ETags, or entity tags, are identifiers for a resource. In short, they are similar to version identifiers for resources; when a resource changes, its ETag changes. When a client GETs a resource, they can store the associated ETag; then, the next time they GET the resource, they can ask for the resource only if it has changed; or, if they fancy, the client could even make a request that should only be executed if they have the latest ETag for a resource!

Strong and weak

One thing I didn’t mention in my original post is that there are two different kinds of ETags: strong and weak ETags. A strong ETag changes whenever a resource changes at all; if any bit in the resource changes, a new ETag is required. Weak ETags, on the other hand, change whenever the resource semantically changes; a resource won’t necessarily have a new ETag if it means the same thing as its previous version.

So how should you generate your ETags? Strong ETags would probably best be generated right after the message body is generated or if entire message bodies are stored by the API; alternatively, for a more effective use of ETags, you could regenerate them whenever the resource changes, although that might be an expensive solution. [In practice, what Iron Money does is create a hash from the data and then stores the hash with the data in MySQL. Your mileage may vary.]

Weak ETags, on the other hand, could be generated from the internal representation of resources in your API. Whenever the resource is changed, simply generate the weak ETag and store it along with the resource’s data.

If you provide a resource that lists a bunch of child resources (e.g. a /books/ resource that lists all of the book resources), I highly recommend creating an ETag for the resource from all of the ETags of the child resources; this way, you don’t need to generate the entire list when one of the child resources changes.

Generating ETags for resources that are searchable can be problematic. You could generate the list of child resource ETags and create the ETag from that; however, this requires you to perform the search even when the request is conditional. Another way of handling the problem is to use the same ETag for the search as you would use for the entire resource (e.g. the ETags for /books/ and /books/?author=Steinbeck would be the same). I recommend doing the latter; it saves you time and shouldn’t hinder clients’ use of your API.

Using ETags internally for caching

The benefits of ETags to third-party clients is rather obvious, but how can ETags be used within your application?

If you’re using a caching system (like Memcached), you’re probably storing your resources by their UUID or by some sort of resource and primary key identifier. Whenever a resource is created, you put it in the cache; whenever a resource is updated, you delete it from the cache or update the cache, and whenever a resource is deleted, you delete it from the cache.

This might work well for your application. However, cache invalidation can be difficult, especially if your application requires ACID capabilities.

ETags can help your application with both of these issues. If you store each resource by its primary identifier and its ETag, you can always be sure that each cache lookup will only return the version of the resource you want. For example, if you’re simply getting a resource, you can look up its ETag in your database, then retrieve the entire resource from the cache and be sure that, if it’s in the cache, then it’s the correct version that you want. Additionally, you don’t have to worry about cache invalidation, because there is no invalidation—you simply let your cache system remove the old versions of resources as the cache fills up.

ETags: because its good for the Internet

ETags are surprisingly powerful once you start to make the most of them. They help HTTP clients cache resources and make conditional requests, and they can help your application with its caching needs as well.

0 Responses to “More on ETags”


  • No Comments

Leave a Reply