The "Partial Clone" feature is a performance optimization for Git that
allows Git to function without having a complete copy of the repository.
The goal of this work is to allow Git to better handle extremely large
repositories.
During clone and fetch operations, Git downloads the complete contents
and history of the repository. This includes all commits, trees, and
blobs for the complete life of the repository. For extremely large
repositories, clones can take hours (or days) and consume 100+GiB of disk
space.
Often in these repositories there are many blobs and trees that the user
does not need such as:
-
files outside of the user’s work area in the tree. For example, in
a repository with 500K directories and 3.5M files in every commit,
we can avoid downloading many objects if the user only needs a
narrow "cone" of the source tree.
-
large binary assets. For example, in a repository where large build
artifacts are checked into the tree, we can avoid downloading all
previous versions of these non-mergeable binary assets and only
download versions that are actually referenced.
Partial clone allows us to avoid downloading such unneeded objects in
advance during clone and fetch operations and thereby reduce download
times and disk usage. Missing objects can later be "demand fetched"
if/when needed.
A remote that can later provide the missing objects is called a
promisor remote, as it promises to send the objects when
requested. Initially Git supported only one promisor remote, the origin
remote from which the user cloned and that was configured in the
"extensions.partialClone" config option. Later support for more than
one promisor remote has been implemented.
Use of partial clone requires that the user be online and the origin
remote or other promisor remotes be available for on-demand fetching
of missing objects. This may or may not be problematic for the user.
For example, if the user can stay within the pre-selected subset of
the source tree, they may not encounter any missing objects.
Alternatively, the user could try to pre-fetch various objects if they
know that they are going offline.