Compression Trees
Abstract

Mïmis is a P2P storage system. It intelligently combines contributions from all the nodes into the network into a navigable system. This is accomplished in part through an abstract representation of the local reseources as communicated to the network. This paper discusses the abstraction of the filesystem, specifically the efficient distributed invalidation of hash trees.

Mïmis URIs
The Structure of Mïmis 1.0
Filesystem Tree

A java program running in an applet is monitoring the filesystem and mirroring its structure to a neo4j instance. So, the filesystem:

Becomes the graph:

Hash Function

A hash function is a mathmatical operation that transforms a series of bits into a fixed-width hash code. So, for example, I can run a program that takes any file and performs a sha-256 hash of it. This will be a 256-bit likely unique identifier for that file.

Converting Filesystems to Hash Trees

Overlay graphs may now be built in Neo4j to align with the filesystem tree.

  1. Hash Files

    The first nodes to add are the sha-256 hashes for each of the files:

  2. Order Contents

    Order the hash nodes within each directory:

  3. Hash Hashes

    The bytes of the hash values for the files are then hashed to produce a new hash:

  4. Order Contents

    The higher-level directory can then be ordered according to those entries:

  5. Hash Hashes

    Repeat the process of ordering and hashing hashes recursively to produce a tree:

  6. Data-Presence Trees

    This produces an isomorphic tree with a unique identifier for the state of each position in the directory tree:

    Stub Names

    Filenames are decomposed into directories containing files whose names are what were previously the file extensions. For example, consider the contents of my image collection. When I started I had files like:

    Converting them to stub form they become:

    The directories leading up to the file are a list of increasingly specific tags. Part of the point of stub links is to ease comprehensible of alternate paths. For example, the path: book/by/Nancy Kress/Beggars in Spain/.

    The directory Beggars in Spain is also linked to from:

    Relative Roots

    Each directory contains a ... subdirectory which is a symlink to ../.... The one exception is the root of the filesystem where ... is a link to ..

    Structurally this creates a tree that looks like:

    Indirected Binaries

    At some depth of parsing, information can be thought of as coherent, meaning that there is an inter-relationship between the constituent parts ∋ removing a part will disrupt the gestalt.

    For example, a Huffman block in a JPEG where changing any of the bits affects all of the decoded bits.

    In the filesystem model, these are represented as files and stored as links to .../hashes/