Transformation Basics

Will Holcomb

26 January 2010

I mentioned previously, the idea of extending a content tree with template trees. There are two types of systems that manipulate these types of trees, in-memory and streaming. In-memory builds a full representation of the tree in the computer whereas streaming deals with the tree as a series of messages generated by a traversal.

Consider this HTML tree:

Traversing the tree means visiting the nodes in order. For stream processing, the travesal is generally "depth-first" meaning the traversal works its way across going as deep as possible in each node:

There are three types of events:

The stream of events is "isomorphic" to the tree meaning that the essential information about the relationships of the data is preserved. You can construct the messages from the tree and vice-versa: