Bill de hÓra: Content vs. Semantics

2008-11-29 @ 15:30#

i've been following an interesting REST-Discuss thread covering REST/WOA issues that brought a number of things to light. subbu posted a response to the thread that's worth checking out.

however, a response from Bill de hÓra caught my attention. he points out that some of what needs to be addressed going forward is not just the importance of using meaningful Media-Types in the Accept and Content-Type HTTP header. it's also important to sort out what these headers are really 'about.' in his mind it has to do with the difference between Content and Semantics:

[I]f I serve an Atom feed as application/xml, are you allowed to consume it as Atom? And if you are, in what can clients reasonably expect servers to stand over? What about those DRM and other extensions people are going to add as AtomPub sees wider adoption for integrations?

As far as I can tell, this is big, big gap in the Web's as-deployed architecture. XML formats can be mixed up without any guidelines as to what downstream clients can assume of the data. Semantic web formats have significant variation in what they can deduce. JSON is arguably in an even worse place as it has no technical basis for anything other than code on demand. I don't know where to start with RDFa - how is lexical tunneling any better than transport tunneling?

This isn't a problem now as we don't ship around much content that needs reasoners. We deal with low level bugs like encoding screwups and malformed, but technically Content-Type supersedes any claims made inside the content itself. Worse, Content-Type is optional in both HTTP and XML. yes, there's RFC3023, but it's a technical description of a switch block that only deals with encoding issues, not semantics. Give it enough time and enough automated agents, and there'll be a lawsuit filed over a discrepancy between what a client assumed and what a server expected.

there's more in his post and it's worth reviewing the thread that leads up to his response.

code