This transformation is used to detect renames and copies, and is
controlled by the -M option (to detect renames) and the -C option
(to detect copies as well) to the git diff-* commands. If the
input contained these filepairs:
:100644 000000 0123456... 0000000... D fileX
:000000 100644 0000000... 0123456... A file0
and the contents of the deleted file fileX is similar enough to
the contents of the created file file0, then rename detection
merges these filepairs and creates:
:100644 100644 0123456... 0123456... R100 fileX file0
When the "-C" option is used, the original contents of modified files,
and deleted files (and also unmodified files, if the
"--find-copies-harder" option is used) are considered as candidates
of the source files in rename/copy operation. If the input were like
these filepairs, that talk about a modified file fileY and a newly
created file file0:
:100644 100644 0123456... 1234567... M fileY
:000000 100644 0000000... bcd3456... A file0
the original contents of fileY and the resulting contents of
file0 are compared, and if they are similar enough, they are
changed to:
:100644 100644 0123456... 1234567... M fileY
:100644 100644 0123456... bcd3456... C100 fileY file0
In both rename and copy detection, the same "extent of changes"
algorithm used in diffcore-break is used to determine if two
files are "similar enough", and can be customized to use
a similarity score different from the default of 50% by giving a
number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
8/10 = 80%).
Note that when rename detection is on but both copy and break
detection are off, rename detection adds a preliminary step that first
checks if files are moved across directories while keeping their
filename the same. If there is a file added to a directory whose
contents are sufficiently similar to a file with the same name that got
deleted from a different directory, it will mark them as renames and
exclude them from the later quadratic step (the one that pairwise
compares all unmatched files to find the "best" matches, determined by
the highest content similarity). So, for example, if a deleted
docs/ext.txt and an added docs/config/ext.txt are similar enough, they
will be marked as a rename and prevent an added docs/ext.md that may
be even more similar to the deleted docs/ext.txt from being considered
as the rename destination in the later step. For this reason, the
preliminary "match same filename" step uses a bit higher threshold to
mark a file pair as a rename and stop considering other candidates for
better matches. At most, one comparison is done per file in this
preliminary pass; so if there are several remaining ext.txt files
throughout the directory hierarchy after exact rename detection, this
preliminary step may be skipped for those files.
Note. When the "-C" option is used with --find-copies-harder
option, git diff-* commands feed unmodified filepairs to
diffcore mechanism as well as modified ones. This lets the copy
detector consider unmodified files as copy source candidates at
the expense of making it slower. Without --find-copies-harder,
git diff-* commands can detect copies only if the file that was
copied happened to have been modified in the same changeset.