How Git Works Internally

Git is a distributed version control system that tracks changes to files through a sophisticated but elegant internal architecture. Understanding how Git works under the hood helps you use it more effectively and troubleshoot issues when they arise.

The Object Database

At its core, Git is a content-addressable filesystem. Everything in Git is stored as objects in the .git/objects directory, and each object is identified by a SHA-1 hash of its contents.

Git has four types of objects:

Blobs - Store file contents (just the raw data, no filename or metadata)
Trees - Represent directories, containing pointers to blobs and other trees
Commits - Point to a tree (the project snapshot) and contain metadata like author, timestamp, and parent commit(s)
Tags - Mark specific commits with human-readable names

Understanding the `.git` Folder

When you run git init in a directory, Git creates a .git folder that contains everything needed to track your project's history. This hidden folder is the heart of your repository—it's where Git stores all its data and configuration. Understanding its structure helps demystify how Git works.

Core Structure

The .git folder typically contains these key components:

HEAD - A file containing a reference to the current branch, usually something like ref: refs/heads/main. This tells Git which branch you're currently on. In detached HEAD state, this file contains a commit SHA directly.

config - Your repository-specific configuration settings. This includes remote URLs, branch tracking information, and local overrides of global Git settings. Settings here take precedence over your global ~/.gitconfig.

description - Used only by GitWeb (a web interface for Git). You can generally ignore this file.

index - The binary file representing your staging area. This is where Git tracks which changes you've added with git add and are ready to commit. It's essentially a snapshot of what your next commit will look like.

hooks/ - A directory containing sample scripts that can trigger actions at specific points in Git's workflow (pre-commit, post-merge, etc.). Rename them by removing .sample to activate them.

Git Objects: Blob, Tree, Commit

Git's internal storage system revolves around three fundamental object types: blobs, trees, and commits. Together, they form an elegant content-addressable database that tracks your project's complete history. Understanding these objects reveals how Git achieves its speed, efficiency, and reliability.

The Foundation: Content-Addressable Storage

Every object in Git is stored in the .git/objects directory and identified by a 40-character SHA-1 hash of its contents. This hash serves as both the object's identifier and a guarantee of integrity—if the content changes even slightly, the hash changes completely. Git uses the first two characters as a directory name and the remaining 38 as the filename.

Blob Objects: Pure Content

A blob (binary large object) stores file contents—nothing more, nothing less. It contains no filename, no permissions, no metadata—just the raw data of a file.

When you stage a file with git add, Git compresses the file contents using zlib and stores it as a blob object. The same content always produces the same SHA-1 hash, so if you have identical files in different locations or commits, Git stores only one blob for that content.

You can examine blob contents using git cat-file -p <hash>, which will show you the exact file contents. The command git hash-object <file> shows you what hash Git would assign to a file without actually storing it.

Key insight: Blobs are content-only. Two files with identical contents but different names share the same blob. This is how Git achieves efficient storage—deduplication happens automatically at the content level.

Tree Objects: Directory Structure

A tree object represents a directory. It contains a list of entries, where each entry includes a mode (file permissions), object type (blob or tree), SHA-1 hash, and filename.

When you commit, Git creates tree objects to represent your directory structure. A tree might contain entries like:

100644 blob a1b2c3d4...  README.md
100755 blob e5f6a7b8...  script.sh
040000 tree 9c0d1e2f...  src

The mode indicates the file type and permissions (100644 for regular files, 100755 for executables, 040000 for directories). Trees can point to other trees, creating the nested directory structure. The root of your project is represented by a single tree object that references all top-level files and directories.

This is where filenames and directory structures live. A blob doesn't know its name—the tree that references it provides that context.

Commit Objects: Snapshots in Time

A commit object ties everything together. It represents a complete snapshot of your project at a specific moment and contains:

A pointer to a tree object (the root directory snapshot)
References to parent commit(s)—zero for initial commits, one for normal commits, two or more for merges
Author information (name, email, timestamp)
Committer information (usually same as author, but can differ)
A commit message

When you run git commit, Git creates tree objects from your staging area, then creates a commit object pointing to the root tree. The commit also points to its parent (the previous commit on the branch), creating a linked history.

You can inspect a commit with git cat-file -p <commit-hash> to see its tree reference, parent(s), author, and message

Inside Git: How It Works and the Role of the .git Folder ?

How Git Works Internally

The Object Database

Understanding the `.git` Folder

Core Structure

Git Objects: Blob, Tree, Commit

The Foundation: Content-Addressable Storage

Blob Objects: Pure Content

Tree Objects: Directory Structure

Commit Objects: Snapshots in Time

Comments

More from this blog

Understanding Network Devices ?

Emmet for HTML: A Beginner’s Guide to Writing Faster Markup ?

Why Version Control Exists: The Pendrive Problem ?

How DNS Resolution Work ?

Command Palette

How Git Works Internally

The Object Database

Understanding the .git Folder

Core Structure

Git Objects: Blob, Tree, Commit

The Foundation: Content-Addressable Storage

Blob Objects: Pure Content

Tree Objects: Directory Structure

Commit Objects: Snapshots in Time

Comments

More from this blog

Understanding the `.git` Folder