Coming out of Drupalcon in Boston, there's a lot of buzz in the community about RDF and Drupal's future. RDF was a prominent theme in Dries' keynote address. Boris Mann and I were chatting at Drupalcon about RDF and Drupal and I thought I'd jot down these notes while they're fresh in my mind.
From the start, Drupal has focused on enabling collaboration. The quintessential Drupal website is not based on a single central authority; it's a collective enterprise.
Take drupal.org. Hundreds of thousands of people come together to pool knowledge, just as thousands of individuals contribute to the software itself. There is authority, but not a sort that's claimed or granted--it's earned. We each select where to place our trust, based on others' reputations and track records. It's all about collaboration.
It's right there in the Drupal mission:
By building on relevant standards and open source technologies, Drupal supports and enhances the potential of the Internet as a medium where diverse and geographically-separated individuals and groups can collectively produce, discuss, and share information and ideas.
And in the somewhat cryptic Drupal slogan: "community plumbing".
If this is what Drupal looks like on an individual website, what would it look like as a tool for freely pooling and sharing information among many sites across the internet? What data pooling technology would best fit Drupal's "community plumbing" culture and approach?
It would be a decentralized technology that doesn't rely on one central hub or site but rather enables all sites to produce, share, and enhance information. It would enable any site selectively to choose where to invest its trust. It would be standards-based and well supported in the open source community. It would be build from the ground up around the principle of pooling information.
In short, it would look a lot like RDF.
Resource Description Framework (RDF) is a standard developed by the World Wide Web Consortium. RDF and its sister technologies provide standard ways to describe, encode, and request information. Then again, there are lots of other approaches out there that can also do this--isn't this what web services in general are all about? What's the special sauce?
On the surface, RDF's terminology can be daunting--at least, it was for me. Read just a few sentences and you're already up against concepts like "triples" and terms that seem to have more to do with grammar or syntax than technology: subjects, predicates, objects. The good news, though, is that it boils down to a very simple idea: any information, no matter how complex, can be communicated through a series of individual statements.
Say we're talking about, I don't know, folk music CDs, and we want to offer musicians' reviews of them. That's a lot of information, but it can be broken down. If we started at the absolute beginning, we might begin by making statements about things that exist and their characteristics:
- There is such a thing as a person.
- A person can have a name.
- There is such a thing as a CD.
- A CD can have a title.
- A musician is a person.
- There is such a thing as a review.
- A review can be of a CD.
- A musician can make a review.
- and so on....
In for example an SQL database, this sort of information might be in a table schema. So, appropriately, in RDF it's in RDF Schema format.
Then there's a set of more specific statements; this is the sort of thing that goes into RDF proper.
- There is a person with the name "Janet".
- Janet is a musician.
- There is a CD.
- This CD has the title "Himalayan Folk Melodies".
- Janet created a review of "Himalayan Folk Melodies".
Finally, having covered the basics, we're on to more interesting information, like statements on the actual reviews Janet made.
We end up with knowledge encoded in discrete statements, ones that can be assembled and combined in many ways to produce different sets of information.
RDF technologies are fundamentally organized around the principle of pulling together rich sets of information from diverse statements. Crucially, those statements can come from anywhere. RDF doesn't assume a single source or any central authority. Anyone can offer up information. And anyone can selectively add to and enhance what someone else has put up, choosing which sources to trust and build on. Information is the sum of many individual statements or contributions. Statements could come from a single source or could just as well be knit together from hundreds of sources, each of which is adding just what it knows best.
Is this picture starting to sound familiar?
We have some brilliant minds in the Drupal community--you only have to spend an afternoon with someone like chx to realize that. But in important ways, no one on their own is as intelligent as we can be together. The best work takes everyone's expertise and energy.
If that principle's true of the contributors to a single project or site, doesn't it hold as well for larger networks--like the world?
Returning to our CD review example. Our musicians' site is offering album reviews. Sure, the site could invest all the time and energy needed to build up and maintain a database of CDs and their publication data. But why? Other sites specialize in such information. Better to choose a dependable source for the base data and focus on adding just your own core expertise.
Now a third site - say, for music fans in general - can take both the album data and the review information and add something new--fan recommendations, whatever. And the internet as a whole starts to look more and more like our quintessential Drupal site--a collective production drawing on everyone's contributions.
Thanks to arto and others, http://drupal.org/project/rdf, we're starting to see a set of tools that promises to make inter-site collaboration as central a part of Drupal as single-site collaboration is today.
So I'm excited by the vistas RDF is opening up. So far I can only catch glimpses, but they're enough to make me want to see much more.