Mïmis is a P2P storage system. Its structure draws from a variety of sources and I want to try to describe it so I can enlist help in developing it.
I'll build the concept up from a traditional filesystem tree:
A hash function is a mathmatical operation that takes a value and transforms it into a fixed-width hash code. These are used in computer science to generate unique
identifiers for files. It is possible for two files to have the same hash, but the probability is vanishingly small.
For example, the function that I am using for the initial version of Mïmis is sha-256. It converts the bytes in a file into a 256-bit hash. The number of possible values representible by 256 bits is:
2256 ≅ 1.1579209 × 1077
For reference, the net estimates there are 1.2 × 1057 atoms in the solar system.
The first transformation is to take each file and rename it to its hash value:
These filenames can now be ordered by their names:
Now the files in each directory can be concatenated and hashed to produce a hash name for the directory: (It may be permissible to just hash the bytes of the names rather than the bytes of the files. That would also be unique and though it would collide with a file containing that series of bytes, I'm not sure that matters.)
Those directories can now be ordered and the process repeated to generate a hash for higher level directories:
Drawn in a more conventional tree form, this structure would be:
There are three main changes to the system:
Filenames are decomposed into directories containing files whose names are what were previously the file extensions. For example, consider the contents of my image collection. When I started I had files like:
Converting them to stub form they become:
The directories leading up to the file are a list of increasingly specific tags. Part of the point of stub links is to ease comprehensible of alternate paths. For example, the path: book/by/Nancy Kress/Beggars in Spain/
.
The directory Beggars in Spain
is also linked to from:
Each directory contains a ...
subdirectory which is a symlink to ../...
. The one exception is the root of the filesystem where ...
is a link to .
.
Structurally this creates a tree that looks like:
All binary files are stored as links to .../hashes/