Path collisions happen when two different paths correspond to the same
entry in the file system. E.g. the paths a and A would collide in a
case-insensitive file system.
The sequential checkout deals with collisions in the same way that it
deals with files that were already present in the working tree before
checkout. Basically, it checks if the path that it wants to write
already exists on disk, makes sure the existing file doesn’t have
unsaved data, and then overwrites it. (To be more pedantic: it deletes
the existing file and creates the new one.) So, if there are multiple
colliding files to be checked out, the sequential code will write each
one of them but only the last will actually survive on disk.
Parallel checkout aims to reproduce the same behavior. However, we
cannot let the workers racily write to the same file on disk. Instead,
the workers detect when the entry that they want to check out would
collide with an existing file, and mark it with PC_ITEM_COLLIDED.
Later, the main process can sequentially feed these entries back to
checkout_entry() without the risk of race conditions. On clone, this
also has the effect of marking the colliding entries to later emit a
warning for the user, like the classic sequential checkout does.
The workers are able to detect both collisions among the entries being
concurrently written and collisions between a parallel-eligible entry
and an ineligible entry. The general idea for collision detection is
quite straightforward: for each parallel-eligible entry, the main
process must remove all files that prevent this entry from being written
(before enqueueing it). This includes any non-directory file in the
leading path of the entry. Later, when a worker gets assigned the entry,
it looks again for the non-directory files and for an already existing
file at the entry’s path. If any of these checks finds something, the
worker knows that there was a path collision.
Because parallel checkout can distinguish path collisions from the case
where the file was already present in the working tree before checkout,
we could alternatively choose to skip the checkout of colliding entries.
However, each entry that doesn’t get written would have NULL lstat()
fields on the index. This could cause performance penalties for
subsequent commands that need to refresh the index, as they would have
to go to the file system to see if the entry is dirty. Thus, if we have
N entries in a colliding group and we decide to write and lstat() only
one of them, every subsequent git-status will have to read, convert,
and hash the written file N - 1 times. By checking out all colliding
entries (like the sequential code does), we only pay the overhead once,
during checkout.