is content-negotiation doomed?

2007-11-21 @ 21:46#

been reading lots on the topic of content negotiation recently. this is all because i am trying to finish up my study of REST and want to fulfill the notion of multiple representations for a single URL. my reading has not been all that encouraging.

first, i see lots of stuff on how conneg is broken. how browsers don't support it properly, how servers usually get it wrong, and how caching is complicated by the whole thing. not at all encouraging.

but i'd like to press on. i'd like to be able to server up multiple representations (HTML, XML, RDF, PDF, SVG) of the same data without being *forced* to use file extensions to do it. it should not be impossible. it should not even be difficult. in fact, i'm thinking that it is *not* all that hard to get right (not considering the proxy caching issue for now).

it seems that the pattern should work like this:

  1. define the URL for the resource
  2. define the default media type for the resource
  3. define the alternate media types for the resource
  4. when a request comes in, determine the best match media type using the Accept header
  5. if one fits, return the resource using the 'best-match' media type
  6. if no clear preference is found (*/*), use the default media type
  7. if there is no match *at all* betweeen the Accept header and the supported media types for the resource, return 406 - Not Acceptable

this seems pretty basic. the only magic here is "determine the best match media type" step. i've seen a couple examples of handling this. i like mimeparse (per this article) from joe gregorio. i plan on adapting his code for my C#-based engine. this should handle 'magic' part. that leaves the details of supporting multiple representations (meaning generating the various outputs for that resource) and that's just a matter of brute force work, right?

the last step is to sort out the caching of all this. and it seems to me that - at the orign server - it amounts to the following work:

  1. get the resource request
  2. compete the conneg work for this resource/URL)
  3. check the local cache (using URL+mediatype as the key)
  4. if found and fresh, return that item
  5. if not found, generate the representation
  6. place the results in the local cache (using URL+mediatype as the key)
  7. return the representation to the caller

not too bad. an adjustment of the caching keys is all that's needed here. but then there's the details of refreshing the cache on edits, etc. now, a single URL is no longer sufficient to know how to clear the cache of stale items. now you need to plan for clearing the cache of all the supported mediatypes for that URL. but that is not complex, either.

  1. get the URL to clear
  2. get ths list of mediatypes supported for the resource at that URL
  3. execute HEAD requests for the URL -for each media type-

and that's not all that tough, either, right?

i guess the last item is whether this pattern is going to work with public proxy cache servers. i suspect it will. as long as each representation is sent with the Vary: Accept header. i'm not sure how this will go down with caching servers. also, i've heard some rumblings that IE might not locally cache anything that has a Vary: header returned. kinda of a bummer, but not a show-stoppper, i think.

so that's it. a relatively clear process. it all hinges on some 'roll-up-the-sleeves' coding and knuckle-down testing. i hope to get it down in a weekend!

code