Git internals

2 min readMay 5, 2021

Git, after its first release in 2005 thanks to Linus Torvalds, became rapidly the most source code management (SCM) tool used in the world, especially for open source projects.
This was due to its design. Torvalds, in fact, develop this system to be very stupid, fast and simple, because it has to support Linux kernel development. So, other features added were the strong support for non-linear development and for large projects.

Git, as we know, is a fully distributed SCM, and each developer that works on that project has in his/her machine a full copy of the repository into the hidden directory .git\.

But how data are stored internally?

Git thinks of its data like a set of snapshots of a miniature filesystems. If a file has not changed, Git does not store it again but create a link to the previous version. Everything is check-summed before it is stored and it is referred to by that checksum.
Each snapshot (commit) refers to one or more other snapshot, creating a directed acyclic graph (DAG).

This DAG is immutable and append-only, and could be view like an object store, in which we have:

blob (binary large object) — the content of a file
tree — list of file names, each with some type bits and a reference to a blob or tree object that is that file, symbolic link, or directory’s contents
commit — links tree objects together into a history
tag — container that contains a reference to another object and can hold added meta-data related to another object

Originally published at https://gabriele-decapoa.github.io.

Git internals

Written by Gabriele de Capoa