Some file formats in Git use a common concept of "chunks" to describe
sections of the file. This allows structured access to a large file by
scanning a small "table of contents" for the remaining data. This common
format is used by the commit-graph and multi-pack-index files. See
the multi-pack-index format in gitformat-pack(5) and
the commit-graph format in gitformat-commit-graph(5) for
how they use the chunks to describe structured data.
A chunk-based file format begins with some header information custom to
that format. That header should include enough information to identify
the file type, format version, and number of chunks in the file. From this
information, that file can determine the start of the chunk-based region.
The chunk-based region starts with a table of contents describing where
each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
where C is the number of chunks. Consider the following table:
| Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
|--------------------|------------------------|
| ID[0] | OFFSET[0] |
| ... | ... |
| ID[C] | OFFSET[C] |
| 0x0000 | OFFSET[C+1] |
Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset.
Each integer is stored in network-byte order.
The chunk identifier ID[i] is a label for the data stored within this
file from OFFSET[i] (inclusive) to OFFSET[i+1] (exclusive). Thus, the
size of the i`th chunk is equal to the difference between `OFFSET[i+1]
and OFFSET[i]. This requires that the chunk data appears contiguously
in the same order as the table of contents.
The final entry in the table of contents must be four zero bytes. This
confirms that the table of contents is ending and provides the offset for
the end of the chunk-based data.
Note: The chunk-based format expects that the file contains at least a
trailing hash after OFFSET[C+1].
Functions for working with chunk-based file formats are declared in
chunk-format.h. Using these methods provide extra checks that assist
developers when creating new file formats.