Elegant Data Representation
The concept of Elegant Data Representation (EDR) is to provide a format for representing relational data in a transportable manner without sacrificing the normalization provided by its original relational structure. The goal of the standard is to be defined as simply as possible.
It is largely a replacement for XML, so where possible, this document will note where XML has a term corresponding to one of EDR’s.
(This is currently a draft)
EDR data is organized into packages. It is the EDR package which is transmitted back and forth (in its serialized form). This package is analogous to an XML document. It can be thought of as a file if necessary, but it can exist as more than a file just as XML can be represented in different ways in memory depending on different DOM implementations. The structure of the EDR package will have a standardized serialization protocol for transport, and a standardized access model. But other than that, it may exist in any physical format which makes it possible to access the data using that standardized access model. EDR’s serialization protocol is analogous to (but far different from) XML’s syntax and well-formedness rules. The standardized access model is analogous to XML’s DOM, XPath, and XQuery.
The EDR package is a collection of nodes and axes
Nodes in EDR are analogous to XML nodes except that there is no distinction made between element and attribute. The purpose for the distinction in XML is not clear. So it is absent from EDR because it would introduce superfluous complexity. Every EDR package has a root node. This is analogous to XML’s document element and just provides an entry point or a frame of reference for queries.
Nodes, like XML nodes can have values. Currently the only type of value supported is text. This is like XML, but support of other values is expected before this draft is finalized. Also nodes are context points and lead to other nodes. In XML an axis which is the mechanism for traversing from one node to another (or in other words, relating one node to another) is either parent-child or sibling-sibling in nature. There is no such limitation in EDR. Axes are user-defined.
Axes are technically named relationships between nodes. This is difficult to explain from an XML perspective because XML handles the concept so poorly. Instead, it is easier to understand from a relational database perspective. In a relational database, you might have, say, customer, order, and product entities (tables). Each record in the customer table represents a customer. The customers have orders, and orders reference products. In a relational database, these are called relationships. In EDR, the concept is called an “axis” and goes a bit further. In an EDR package, you can traverse from a node representing a customer, down the “placed orders” axis to find the orders that the customer has placed. Then you could follow that down the “for products” axis to find the products. Following axes to related node sets is much like a “JOIN” in relational databases. However, in an EDR package, even the fields are accessed via axes.
EDR package data is accessed using a query which is very much like XPath or XQuery. The nodes that are returned are based on the nodes encountered by following named axes. The concept is quite simple but it might be difficult to grasp or apply at first glance. Axes are not related to a single XML construct, but rather the nature of relationships between nodes in XML.
Imagine that and EDR package is just a huge pool of nodes. The nodes may be values, such as “John”, “Smith”, “47,” etc. Some nodes may not have values associated with them*, but are rather used as points to tie other nodes together (or, relate them, in other words). The job of relating the nodes falls to that of axes.
In XML, there might be an employee element which contains a firstname, lastname, and age element. The employee element doesn’t have any value associated with it. It is just used as a node to “parent” the other two nodes or to “tie them together.” The relationship of the employee element to the firstname element could be described as “child named firstname.” And this is how the element is found by DOM implementations for XML. The EDR equivalent is to have an employee node with no value, and axis called “first name” which when followed from the employee node leads to the node containing the first name.
So now consider that you have an axis named “employee,” one named “first name,” one named “last name,” one named “age,” and one named “manages.” Don’t worry about the syntax of queries right now, just consider that you query nodes by forming a hierarchy of axes together. You can get the names of all employees by following the employee axis from the root node to all the employee nodes in the package. This changes your “context” to be the set of employee nodes. Now from this context, you can continue to follow the “first name” axis and the “last name” axis to get the names of the employees. You can then follow the “manages” axis to get the managers for those employees (which are other employees).
A few principles fall out of this.
- Unlike XML, none of your nodes actually have names. Their purpose, or class, if they have one, is inferred from the axis that got you there. Note that this doesn’t present a problem for a client that queried the information, because the axes returned are a result of the query the client sent.
- EDR data remains normalized. There is no reason to repeat any data in an EDR package. If one node is related to several other nodes, axes are simply added for this purpose
- A single EDR package can contain the results of several normal SQL-like queries without redundant data. So if a node is returned by more than one query, it is simply the target of more axes, but its data is not duplicated. This means that only the necessary data is sent from the database to the application. The application can then, with its original query, requery the returned EDR package, locally, for each individual part it needed, but the elements remain joined (matched up) within the package by axes. This also solves problems with recursive “joins.”
* note: currently it bothers me that some nodes may or may not have values associated with them. Perhaps that’s not a problem, but to me adds an extra layer of complexity that I’m not sure needs to be there