Mïmis is a P2P storage system. It intelligently combines contributions from all the nodes into the network into a navigable system. This is accomplished in part through an abstract representation of the local reseources as communicated to the network. This paper discusses the abstraction of the filesystem, specifically the efficient distributed invalidation of hash trees.
A java program running in an applet is monitoring the filesystem and mirroring its structure to a neo4j instance. So, the filesystem:
Becomes the graph:
A hash function is a mathmatical operation that transforms a series of bits into a fixed-width hash code. So, for example, I can run a program that takes any file and performs a sha-256 hash of it. This will be a 256-bit likely unique identifier for that file.
Overlay graphs may now be built in Neo4j to align with the filesystem tree.
The first nodes to add are the sha-256 hashes for each of the files:
Order the hash nodes within each directory:
The bytes of the hash values for the files are then hashed to produce a new hash:
The higher-level directory can then be ordered according to those entries:
Repeat the process of ordering and hashing hashes recursively to produce a tree:
This produces an isomorphic tree with a unique identifier for the state of each position in the directory tree:
Filenames are decomposed into directories containing files whose names are what were previously the file extensions. For example, consider the contents of my image collection. When I started I had files like:
Converting them to stub form they become:
The directories leading up to the file are a list of increasingly specific tags. Part of the point of stub links is to ease comprehensible of alternate paths. For example, the path: book/by/Nancy Kress/Beggars in Spain/
.
The directory Beggars in Spain
is also linked to from:
Each directory contains a ...
subdirectory which is a symlink to ../...
. The one exception is the root of the filesystem where ...
is a link to .
.
Structurally this creates a tree that looks like:
At some depth of parsing, information can be thought of as coherent
, meaning that there is an inter-relationship between the constituent parts ∋ removing a part will disrupt the gestalt.
For example, a Huffman block in a JPEG where changing any of the bits affects all of the decoded bits.
In the filesystem model, these are represented as files and stored as links to .../hashes/