Git Storage

Git storage relies extensively on the SHA-1 hash algorithm which produces a 160-bit key for each blob and each tree node.

The values are not computed on the blobs themselves, but on a composite of:

  1. The utf-8 string "blob "
  2. The utf-8 string for the file length
  3. A null byte
  4. The contents of the blob

So, these two commands produce equivalent results:

Trees are computed from:

  1. The utf-8 string "tree "
  2. The utf-8 string for the content length
  3. A null character
  4. Zero or more content markers consisting of:
    1. The mask of the item
    2. UTF-8 space character
    3. The name of the item
    4. The 20-byte hash of the item

Commits are:

  1. The utf-8 string "commit "
  2. The utf-8 string for the content length
  3. A null byte
  4. The utf-8 string "tree "
  5. The utf-8 string for the hash
  6. A linefeed character
  7. The utf-8 string "author "
  8. 1234567890 -0800]]>
  9. A linefeed character
  10. The utf-8 string "committer "
  11. 1234567890 -0800]]>
  12. Two linefeed characters
  13. The commit message
  14. A linefeed character