Four Ways to Keep Junk Out of Git

Source control is a vital part of software development, but you don’t always want it to keep track of everything. Passwords, temporary changes, and desperate debugging edits are best kept out of version control. I’ve found four effective ways to handle this.

1. There’s Always .gitignore

The standard method for excluding whole files from Git is the .gitignore file. This is covered in Git Basics, so I won’t go into it here. But eventually, you’ll need a more fine-grained way to ignore changes. Read on.

2. Use a Pre-Commit Hook

I often find myself adding debug logging or other temporary changes that I don’t want to commit. Expecting myself to remember what not to commit, especially if other changes have happened in the meantime, is not always reasonable.

I’ve run into this problem so often that I now keep a generic Git pre-commit hook that I can drop into any project. As you may know, Git allows many actions to be scripted by using hooks. All you need to do is drop an executable file (named for the desired hook) into the .git/hooks directory of a Git repository.

In my case, I want to prevent any commit that contains unwanted changes I’ve staged accidentally. I want to be able to do this with standard (GNU) tools, making it easy to drop into any project with no further thought.

What I do is tag some code (usually with a comment) using the text “NOCOMMIT.” My pre-commit hook can then look for that text and reject the commit.

#!/bin/bash

# Redirect output to stderr.
exec 1>&2

prevent_commit=0

# List added lines with a NOCOMMIT comment
nocommits=$(git diff --cached | grep -C2 '^+.*NOCOMMIT' | sed 's#\+ ##')
if [[ -n $nocommits ]]; then
    echo "Don't commit that!"
    echo "$nocommits"
    prevent_commit=1
fi

if [[ $prevent_commit == 1 ]]; then
    echo -e "\n(use the -n flag to bypass pre-commit hooks)"
    exit 1
fi

The interesting part is the one-liner before the if:

  • git diff --cached examines only changes that are currently staged for commit.
  • grep -C2 '^+.*NOCOMMIT' narrows the diff down to lines that were added.
  • sed 's#\+ ##' cleans up the output by removing those diff markers.

If any added NOCOMMIT lines are found, the commit will be aborted, printing out the matched lines. All it takes is exiting with non-zero status to abort the commit.

I also like to include a hint that the hook can be bypassed by using git commit -n. Finally, I separate the NOCOMMIT detection from the commit rejection since I often extend this script to look for language-specific text (e.g. “console.log” in a JavaScript project).

3. Sample Config Templates

Every project I’ve worked on has passwords, API keys, or other secrets that shouldn’t be checked into source control. Often, these secrets need to live in the same file as some other configuration, so the file can’t be outright .gitignored. The goals are to keep the configuration in source control and the passwords out of it.

Say you have a configuration file, App.config. Replace the sensitive information with some placeholder text, save it as App.config.sample, and commit the file to Git. Then, add a rule to .gitignore for App.config. Now, you’ve got a template that can be filled in quickly when you’re setting up the project for the first time.

The downside is that now you’ve got two files that need to be edited whenever a configuration setting is added or modified. I’ve found that this happens infrequently enough that it’s worth the trade-off for automatically excluding secrets from Git.

4. Pretend to Ignore It

If none of the solutions above work, you can also tell Git to pretend a file is unchanged. Just run git update-index --assume-unchanged with some files containing edits that you want to ignore. You’ll notice that those changes no longer show up in git status output, and they won’t be staged for commits.

What happens is that Git skips the files you’ve specified when looking for changes. This causes Git to ignore changes in the entire file, while keeping the file in source control, but it’s up to you to remember which files you’ve assumed are unchanged! If you forget, you’ll be very confused later, wondering why your edits aren’t showing up in status output. To stop ignoring the file, use git update-index --no-assume-unchanged.

I don’t use this technique much, since the preceding techniques tend to work just fine. But it could be useful in some circumstances.

What are your favorite techniques?

Conversation
  • Matt says:

    Good tips, thank you for sharing.
    Do you have any tips for remediating the inadvertent inclusion of sensitive data (e.g. access credentials)? I’ve just discovered months worth of hard-coded MySQL access credentials in a python git repo and am head-scratching as to how effectively to deal with this.

    • Brian Vanderwal Brian Vanderwal says:

      Hi Matt, I’ve used the BFG Repo-Cleaner (https://rtyley.github.io/bfg-repo-cleaner/) in the past for removing large binaries that were accidentally checked in, but it is designed to scrub snippets of sensitive data as well.

      It’s a bit disruptive, but there’s no way around that since any branch containing contaminated history will have to be modified (so the commit hashes will change). And everybody using the repo will need to clone a fresh copy once the BFG has done its job. Additionally, it would be a good idea to change those MySQL credentials after they’ve been removed.

  • Thanks for sharing! I’m using your nocommit, but when I commit (from VS Code), the no-commit-alert successfully shows, but the file still shows up as staged (and I have to go through a procedure to undo). Is there any way to fully cancel any and all files as if one didn’t hit commit, just showing the alert with no staging (so one can remove the nocommit comment)? Thanks!

    • Brian Vanderwal Brian Vanderwal says:

      If you just want to make a change to address the nocommit comment and then retry the commit, there’s no need to undo anything. Just make your edit and stage it (this works even if your edit is removing something that was staged as an insertion).

      Git tracks changes, which are just grouped by file for presentation. So you can have some changes in a file that are staged, while other changes in the same file are not staged (in which case, the file would appear listed under both “staged” and “not staged”). Does that make sense?

      Since staging and committing are separate operations, there’s no reliable way to undo staging from a commit operation.

  • Comments are closed.