Collapsing the Multiverse

Posted on: May 18th, 2005 5:26 PM GMT

By: Greg Reimer (Code Monkey Extraordinaire)

Topic: tech, programming, software paradigms

In the web application I work on, the data in question exists in one of about four phases at any given time, depending on how you draw the distinctions. Most of the work I do is in trying to herd the data through these phases, expose it to some interface for consumption and/or manipulation, then herd it back across again. Yah!

Phase is actually the wrong word. Universe is better. The data has to shift in and out of different universes, each defining its own view of reality. RDBs and OOP come readily to mind. Both are built around basic concerns necessitated by fundamental paradigms in modern computer architecture; the data needs to, 1) be stored on a disk (RDB) and, 2) pulled into RAM and run through the CPU (OOP).

Since both these universes have their own internal data model empire (object model, schema, whatever), you have adapters like JDBC to convert between them. Dancing around the adapter, I believe, is where much of the pull-your-hair-out complexity comes from in writing software. The adapter's functionality is simple: acquire cnxn, execute stmt, release cnxn. The adapter's strengths and weaknesses are easy to understand: connections are expensive, network calls introduce latency, certain types of statements hang the DB server. Surrounding it all you have the ever tightening noose of increasing drain on the computer's resouces as the application gets used.

My observation is this: coding to this basic set of conditions while navigating the paradigmatic rift between the universes quickly results in Enormous Complexity, similar to how a cellular automata with simple rules and preconditions propagates into a complex set of arrangements (e.g. chess or the Game of Life). Enormous Complexity necessitates clever exception handling, patterns and antipatterns, performance tuning, persistence frameworks or backing as much logic into the DB layer as possible, but while these techniques help, they're essentially artifacts built around complexity and they don't inform the task at hand.

Therefore anything that eliminates the need for these adapters in the first place is a good thing. Case in point: XForms. Besides RDBs and OOP there are the universes of XML and web forms as name/value pairs. XForms replaced the name/value universe with basic XML, which means that, if I used XForms, I could cut out reams of tedious parameter mapping code and let the duty fall to my already-existing XML functions. XForms collapsed the multiverse.

The question on my mind is, will the RDB/OOP multiverse ever collapse? Is it possible to store data in a form that can be exposed directly as a DOM-like construct, without any under-the-sheets translation between disparate RDB and OOP systems? If so, perhaps the database can be thought of as a virtualized instance of a Big Linked List whose links are more akin to URIs than object references, in that they point to resources within an abstract space that doesn't know about RAM or disk memory. A Container would map these links to real data behind the virtual layer, where this Big Linked List (BLL) would be backed by disk data, wile different parts of it are instantiated in physical memory at any given moment according to some intelligent algorithm managed by the Container. (Imagine something remotely akin to swap memory.) Mutations against the BLL's nodes would be backed directly by changes to disk data, or queued in a transaction space. In front of the virtual layer, the BLL would be treated as fully instantiated, so that you could expose pieces of it with XPath- or XQuery-like statements and do work on them, and various nodes throughout the BLL could listen to (or be observed by) other nodes, reacting to conditions and events in useful ways.

As an example I'm thinking of a CMS. The either/or distinction between an XML document tree and a collection of relational data (either of which could be considered "content") is one that gets everybody in my org sufficiently jumbled as to cause me major grief; more casualties of the multiverse problem. In my theoretical system it's not either/or, it's both/and. The BLL is directly analogous to a DOM tree that can be serialized as XML or transformed, but its individual nodes—being resources unto themselves—can also be shared, cross-linked and made relational via an RDF-like meta-framework. CMS content is thus stored in a super-normal state that is inherently both document-centric and relational, therefore collapsing the multiverse and avoiding Enormous Complexity and subsequent artifacts.

Now I must disclaim that I'm a speculative nut with these kind of things. Maybe it's a pipe dream, or maybe it's been done. Maybe it's a dumb idea for reasons I haven't thought of. It just seems that if such a framework could be built, data-driven applications would be an order of magnitude or two easier to write and maintain, and web based applications would almost fall out of it, especially with XForms in the mix. I've heard interesting things about DOM databases that seem to hold some promise, but I must say I don't know a heck of a lot about them. I'll keep researching. Meanwhile I'd welcome any comments, corrections, hints or pointers about this stuff if anybody feels so inclined.

Well, I suppose I'd better get back to navigating rifts in the multiverse and dealing with Enormous Complexity and said artifacts, meanwhile attempting to mitigate the confusion created by my app's plurality of data representation. Thanks for reading my screed.

(Originally posted on 14 Jul 2004 at my work blog)

weblog home »
show all posts »