Git Pre-Commit Hook

Skip Git commit hooks

Maybe (from git commit man page):

git commit --no-verify -m "commit message"
^^^^^^^^^^^
-n
--no-verify

This option bypasses the pre-commit and commit-msg hooks. See also githooks(5).

As commented by Blaise, -n can have a different role for certain commands.

For instance, git push -n is actually a dry-run push.

Only git push --no-verify would skip the hook.


Note: Git 2.14.x/2.15 improves the --no-verify behavior:

See commit 680ee55 (14 Aug 2017) by Kevin Willford (``).

(Merged by Junio C Hamano -- gitster -- in commit c3e034f, 23 Aug 2017)

commit: skip discarding the index if there is no pre-commit hook

"git commit" used to discard the index and re-read from the filesystem
just in case the pre-commit hook has updated it in the middle; this
has been optimized out when we know we do not run the pre-commit hook.


Davi Lima points out in the comments the git cherry-pick does not support --no-verify.

So if a cherry-pick triggers a pre-commit hook, you might, as in this blog post, have to comment/disable somehow that hook in order for your git cherry-pick to proceed.

The same process would be necessary in case of a git rebase --continue, after a merge conflict resolution.


With Git 2.36 (Q2 2022), the callers of run_commit_hook() to learn if it got "success" because the hook succeeded or because there wasn't any hook.

See commit a8cc594 (fixed with commit 4369e3a1), commit 9f6e63b (07 Mar 2022) by Ævar Arnfjörð Bjarmason (avar).

(Merged by Junio C Hamano -- gitster -- in commit 7431379, 16 Mar 2022)

hooks: fix an obscure TOCTOU "did we just run a hook?" race

Signed-off-by: Ævar Arnfjörð Bjarmason

Fix a Time-of-check to time-of-use (TOCTOU) race in code added in 680ee55 ("commit: skip discarding the index if there is no pre-commit hook", 2017-08-14, Git v2.15.0-rc0 -- merge listed in batch #3).

This obscure race condition can occur if we e.g. ran the "pre-commit" hook and it modified the index, but hook_exists() returns false later on (e.g., because the hook itself went away, the directory became unreadable, etc.).

Then we won't call discard_cache() when we should have.

The race condition itself probably doesn't matter, and users would have been unlikely to run into it in practice.

This problem has been noted on-list when 680ee55 was discussed, but had not been fixed.

Let's also change this for the push-to-checkout hook.

Now instead of checking if the hook exists and either doing a push to checkout or a push to deploy we'll always attempt a push to checkout.

If the hook doesn't exist we'll fall back on push to deploy.

The same behavior as before, without the TOCTOU race.

See 0855331 ("receive-pack: support push-to-checkout hook", 2014-12-01, Git v2.4.0-rc0 -- merge) for the introduction of the previous behavior.

This leaves uses of hook_exists() in two places that matter.

The "reference-transaction" check in refs.c, see 6754159 ("refs: implement reference transaction hook", 2020-06-19, Git v2.28.0-rc0 -- merge listed in batch #7), and the "prepare-commit-msg" hook, see 66618a5 ("sequencer: run 'prepare-commit-msg' hook", 2018-01-24, Git v2.17.0-rc0 -- merge listed in batch #2).

In both of those cases we're saving ourselves CPU time by not preparing data for the hook that we'll then do nothing with if we don't have the hook.

So using this "invoked_hook" pattern doesn't make sense in those cases.

The "reference-transaction" and "prepare-commit-msg" hook also aren't racy.

In those cases we'll skip the hook runs if we race with a new hook being added, whereas in the TOCTOU races being fixed here we were incorrectly skipping the required post-hook logic.

How to make sure my git pre-commit script won't get fooled?

Then it triggers my pre-commit script which happens to read test.py to make sure nothing's wrong with it. The thing is, the on-its-way-to-be-committed test.py and the one in my working tree are different !!

That's why you need to make sure your pre-commit script runs on files in the index, not on your work tree. It's actually very common for staged commits to be different from what's actually in the work tree (consider, for example, git add -p, which lets you stage portions of files).

One way of handling this is to check out the index into a temporary directory and run your tests there. You can use the git checkout-index command to check out a copy of the index into a temporary directory.

Here's an example pre-commit hook that will reject a commit if any files contain the word BAD:

#!/bin/sh

echo "running checks"

# create a temporary directory
tmpdir=$(mktemp -d precommitXXXXXX)

# make sure we clean it up when we're done
trap "rm -rf $tmpdir" EXIT

# check out the index
git checkout-index --prefix=$tmpdir/ -af

# run tests in a subshell so that we end up back in the current
# directory when everything finishes.
(
cd $tmpdir

if grep -q BAD *; then
echo "ERROR: found bad files"
exit 1
fi
)

I believe this also addresses your second question about ensuring that the tree you're testing stays consistent during the tests. Because here you're working in a temporary directory with a copy of the repository, you don't need to worry about anything changing.

How can I manually run a Git pre-commit hook, without attempting a commit?

Just run the pre-commit script through the shell:

bash .git/hooks/pre-commit

In Git pre-commit hook, temporarily remove all changes that are not about to be commited

You were actually quite close to a correct solution.

(In this answer, I'm going to use the word "cache" instead of "stage" because the latter one is too similar to "stash".)

In fact, the trick with using stash would work even if you were to commit files that are not cached. This is because Git changes the cache for the duration of running hooks, so it always contains the correct files. You can check it by adding the command git status to your pre-commit hook.

So you can use git stash push --include-untracked --keep-index.

The problem with conflicts when restoring the stash is also quite easily solvable. You already have all the changes backed up in the stash so there is no risk of loosing anything. Just remove all the current changes and apply the stash to a clean slate.

This can be done in two steps.
The command git reset --hard will remove all the tracked files.
The command git clean -d --force will remove all untracked files.

After that you can run git stash pop --index without any risk of conflicts.



A simple hook would look like that:

#!/bin/sh

set -e

git stash push --include-untracked --keep-index --quiet --message='Backed up state for the pre-commit hook (if you can see it, something went wrong)'

#TODO Tests go here

git reset --hard --quiet
git clean -d --force --quiet
git stash pop --index --quiet

exit $tests_result

Let's break it down.

set -e ensures that the script stops immediately in case of an error so it won't do any further damage.
The stash entry with backup of all changes is done at the beginning so in case of an error you can take manual control and restore everything.

git stash push --include-untracked --keep-index --quiet --message='...' fulfills two purposes. It creates a backup off all current changes and removes all non staged changes from the working directory.
The flag --include-untracked makes sure that untracked files are also backed up and removed.
The flag --keep-index cancels removal of the cached changes from the working directory (but they are still included in the stash).

#TODO Tests go here is where you tests go.
Make sure you don't exit the script here. You still need to restore the stashed changes before doing that.
Instead of exiting with an error code, set its value to the variable tests_result.

git reset --hard --quiet removes all the tracked changes from the working directory.
The flag --hard makes sure that nothing stays in the cache and all files are deleted.

git clean -d --force --quiet removes all the untracked files from the working directory.
The flag -d tells Git to remove directories recursively.
The flag --force tells Git you know what you're doing and it is really supposed to do delete all those files.

git stash pop --index --quiet restores all changes saved in the latest stash and removes it.
The flag --index tells it to make sure it didn't mixed up which files were cached and which were not.



Disadvantages of this method

This method is only semi-robust and it should be sufficient for simple use cases.
However, they are quite a few corner cases that may break something during real-life usage.

git stash push refuses to work with files that were only added with the flag --intent-to-add.
I'm not sure why that is and I haven't found a way to fix it.
You can bypass the problem by adding the file without the flag or by at least adding it as an empty file and left only the content not cached.

Git tracks only files, not directories. However, the command git clean can remove directories. As the result, the script will remove empty directories (unless they are ignored).

Files that were added to .gitignore since the last commit will be deleted. I consider this a feature but if you want to prevent it, you can by reversing the order of git reset and git clean.
Note that this works only if .gitignore is included to the current commit.

git stash push does not create a new stash if there is no changes but it still returns 0. To handle commits without changes such as changing the message using --amend you would need to add some code that checks if stash was really created and pop it only if it was.

Git stash seems to remove the information about current merge, so using this code on a merge commit will break it.
To prevent that, you need to backup files .git/MERGE_* and restore them after popping the stash.



A robust solution

I've managed to iron out most of the kinks of this method (making the code much longer in the process).

The only remaining problem is removing empty directories and ignored files (as described above). I don't think these are severe enough issues to take time trying to bypass them. (It is doable, though.)

#!/bin/sh

backup_dir='./pre-commit-hook-backup'
if [ -e "$backup_dir" ]
then
printf '"%s" already exists!\n' "$backup_dir" 1>&2
exit 1
fi

intent_to_add_list_file="$backup_dir/intent-to-add"
remove_intent_to_add() {
git diff --name-only --diff-filter=A | tr '\n' '\0' >"$intent_to_add_list_file"
xargs -0 -r -- git reset --quiet -- <"$intent_to_add_list_file"
}
readd_intent_to_add() {
xargs -0 -r -- git add --intent-to-add --force -- <"$intent_to_add_list_file"
}

backup_merge_info() {
echo 'If you can see this, tests in the `pre-commit` hook went wrong. You need to fix this manually.' >"$backup_dir/README"
find ./.git -name 'MERGE_*' -exec cp {} "$backup_dir" \;
}
restore_merge_info() {
find "$backup_dir" -name 'MERGE_*' -exec mv {} ./.git \;
}

create_stash() {
git stash push --include-untracked --keep-index --quiet --message='Backed up state for the pre-commit hook (if you can see it, something went wrong)'
}
restore_stash() {
git reset --hard --quiet
git clean -d --force --quiet
git stash pop --index --quiet
}

run_tests() (
set +e
printf 'TODO: Put your tests here.\n' 1>&2
echo $?
)

set -e
mkdir "$backup_dir"
remove_intent_to_add
backup_merge_info
create_stash
tests_result=$(run_tests)
restore_stash
restore_merge_info
readd_intent_to_add
rm -r "$backup_dir"
exit "$tests_result"

Pre-commit hook based on an executable (not git repo)

The pre-commit allows to use local sentinel for a repo section. The example config below runs black as installed on the system:

repos:
- repo: local
hooks:
- id: black
name: black
language: system
entry: black
types: [python]

Calling git in pre-commit hook

  1. Am I allowed to call git inside git hooks?

Yes, but you must exercise caution, as there are a number of things set in the environment and you're working with something that is in the middle of being done:

  • GIT_DIR is set to the path to the Git directory.
  • GIT_WORKTREE may be set to the path to the work-tree (from git --work-tree).
  • Other Git variables, such as GIT_NO_REPLACE_OBJECTS, may be set from the command line as well.

(You should leave these set if you're continuing to work with the current repository, but clear them out if you're working with a different repository.)


  1. If 1. is ok: when exactly is pre-commit hook called if I do git commit -am"bla"? In particular does git do staging first and then it calls the pre-commit hook or not?

This is complicated.

There are three "modes" that git commit uses internally. (There are no promises about this, but that's how things have been implemented for many years now so this three-modes thing seems pretty stable.) The modes are:

  • git commit without -a, --include, --only, and/or any command-line-specified file names. This is the default or normal mode. The underlying implementation details do not show through.

  • git commit with -a or with command-line-specified file names. This divides into two sub-modes:

    • such a commit with --include, or
    • such a commit with --only.


    At this point, the underlying implementation shows through.

The underlying implementation details here involve the thing that Git calls, variously, the index, the staging area, and (rarely now) the cache, which is normally implemented as a file named $GIT_DIR/index (where $GIT_DIR is the environment variable from the note about point 1). Normally, there is only one of these: the index. It holds the content that you intend to commit.1 When you run git commit, Git will package up whatever is in the index as the next commit.

But, during the operation of git commit, there may be up to three index files. For the normal git commit there's just the one index, and your pre-commit hook can use it and can even update it. (I advise against updating it, for reasons we'll see in a moment.)

But, if you do a git commit -a, or git commit --include file.ext, now there are two index files. There's the content that's ready to be committed—the regular index—and one extra index, which is the original index plus the result of doing a git add on file.ext or on all files (the equivalent of git add -u). So now there are two index files.

In this mode, Git leaves the regular index file as the regular index file. This file is in $GIT_DIR/index as usual. The second index file, with the extra added stuff, is in $GIT_DIR/index.lock and the environment variable GIT_INDEX_FILE is set to hold that path. If the commit fails, Git will remove the index.lock file and everything will be as if you had not run git commit at all. If the commit succeeds, Git will rename index.lock to index, releasing the lock and updating the (standard, regular) index all in one motion.

Finally, there's the third mode, which you get when you run git commit --only file.ext for instance. Here, there are now three index files:

  • $GIT_DIR/index: The standard index, which holds what it usually does.
  • $GIT_DIR/index.lock: A copy of the standard index to which file.ext has been git add-ed.
  • $GIT_DIR/indexsuffix: A copy of the HEAD commit2 to which file.ext has been git add-ed.

The environment variable GIT_INDEX_PATH points to this third index. If the commit succeeds, Git will rename the index.lock file to index, so that it become the index. If the commit fails, Git will remove the index.lock file, so that the index goes back to the state it had before you started. (And in either case, Git removes the third index, which has now served its purpose.)

Note that from a pre-commit hook, you can detect whether git commit is a standard commit (GIT_INDEX_FILE is unset or set to $GIT_DIR/index) or one of the two special modes. In standard mode, if you want to update the index, you can do so as usual. In the two special modes, you can use git add to modify the file that GIT_INDEX_FILE names, which will modify what goes into the commit; and if you're in the --include style commit, this also modifies what will become the standard index on success. But if you're in the --only mode, modifying the proposed commit doesn't affect the standard index, nor the index.lock that will become the standard index.

To consider a concrete example, suppose the user did:

git add file1 file2

so that the standard index matches HEAD except for file1 and file2. Then the user runs:

git commit --only file3

so that the proposed commit is a copy of HEAD with file3 added, and, if this commit succeeds, Git will replace the standard index with one in which file1, file2, and file3 are all added (but since file3 will match the new HEAD commit, only files 1 and 2 will be modified in the new index).

Now suppose your commit hook runs git add file4 and the process as a whole succeeds (the new commit is made successfully). The git add step will copy the work-tree version of file4 into the temporary index, so that the commit will have both file3 and file4 updated. Then Git will rename the index.lock file, so that file3 will match the new HEAD commit. But file4 in the index.lock was never updated, so it won't match the HEAD commit. It will appear to the user that somehow, file4 got reverted! A git status will show a pending change to it, staged for commit, and git diff --cached will show that the difference between HEAD and the index is that file4 has been changed back to match the file4 in HEAD~1.

You could have your pre-commit hook test for this mode and refuse to git add files when in this mode, to avoid the problem. (Or, you could even sneakily add file4 to index.lock, with a second git add command!) But it's generally better to have your hook just reject the commit, with advice to the user to do any git adds themselves, so that you don't have to know all of these implementation secrets about git commit in the first place.


1The index holds some extra information as well: cache data about the work-tree. That's why it's sometimes called the cache. These extra copies that I describe here are made by copying the original index, so the extra copies also have the same cache data, except if and when they get updated via git add.

2It's not specified whether Git makes this copy via the internal equivalent of:

TMP=$GIT_DIR/index<digits>
cp $GIT_DIR/index $TMP
GIT_INDEX_FILE=$TMP git reset
GIT_INDEX_FILE=$TMP git add file3

or some other means (e.g., the internal equivalent of git read-tree), but since this particular copy is always just removed at the end of the process, it doesn't matter: any cache information for the work-tree becomes irrelevant.

git add same files in precommit after formatting, cannot add deleted files

Turns out it was too simple. Fixed by adding --diff-filter, which would filter out deleted files.

#!/bin/sh
. "$(dirname "$0")/_/husky.sh"

added_files=$(git diff --name-only --cached --diff-filter=d)

yarn fix
git add ${added_files}


Related Topics



Leave a reply



Submit