composites, caching, and freshness

2009-02-20 @ 18:41#

as part of another project i am working on, i had a chance to review some basic Web-caching challenges.

typical resource definition

typically, my REST-ful implementations for public (non-authenticated) resources support a POST factory along with PUT, DELETE, GET. something like this:

Method URI Response Comments
POST /customers/ 302 Found
Location:/customers/{id}
Creates new resource and generates new {id} value
PUT /customers/{id} 200 OK Updates existing resource
DELETE /customers/{id} 204 No Content Removes existing resource
GET /customers/ 200 OK Returns a list of customer resoures
/customers/{id} 200 OK Returns a single existing resource

this is nothing too fancy or amazing. but it does expose a rather common challenge regarding caching:

"If you are using caching intermediaries, what happens to the cached list of customers (GET /customers/) after a PUT, POST, or DELETE?"

the problem is that the /customers/ resource migh now have stale data. maybe too many resources in the list (due to deletes) or one that has been edited, etc. this is esp. true for widely-used sites that have dynamic data. basically, this problem occurs becuase the resource that results from GET /customers/ is a composite resource. it's made up of one or more existing resources in the system. if you're not careful, composites will give you headaches.

fun w/ caching headers

there are a number of ways to keep your composites fresh using the Cache-Control HTTP header. here are some common examples:

ETag:{#etag}
Cache-Control:public
max-age:3600
this is the minimum caching set. provide the (strong) ETag and tell any intermediaries that this is publicly cache-able resource. the problem here is that the composite resource (customer list) will fall out of sync since intermediaries have been told to keep them for up to one hour
ETag:{#etag}
Cache-Control:no-cache
in this case intermediaries will no longer cache the item. not ideal, but acceptable for small-ish implementations
ETag:{#etag}
Cache-Control:public, must-revalidate
this is better. intermediaries are now instructed to re-validate the resource w/ the origin server before continuing.

there are a number of other possible caching directives avialable (see the link above for details). the point here is you have lots of options when it comes to alining your resource 'freshness' requirements w/ your bandwidth and request-load requirements.

code