Thesis Possibilities

Will Holcomb

16 January 2010

For my masters thesis, I am working on a design for a large-scale versioning system. The system that most closely resembles my idea currently is git which was designed for maintaining the Linux kernel.

The basic building blocks of a git repository are "blobs." In git, a blob is simply a file:

Blobs are organized into trees which encode hierarchical relationships between blobs:

Versioning takes place as "commits" which capture a snapshot of a tree and the associated blobs:

As the user adds, removes and edits blobs and subtrees, these changes are captured in the progression of commits:

Commits don't have to progress linearly. When multiple sets of changes originate from a common ancestor, this is known as a "branch."

For my thesis, I want to track the relationships between blobs at a larger scale. The system is essentially that individual servers export a list of their resources and those individual trees are stitched together into a metagraph that allows tracking the relationships between blobs across servers. Creating edges within that metagraph then can allow for interactions that wouldn't otherwise be possible.

The specific