Git Pathspecs and How to Use Them

Avatar of Adam Giese
Adam Giese on

When I was looking through the documentation of git commands, I noticed that many of them had an option for <pathspec>. I initially thought that this was just a technical way to say “path,” and assumed that it could only accept directories and filenames. After diving into the rabbit hole of documentation, I found that the pathspec option of git commands are capable of so much more.

The pathspec is the mechanism that git uses for limiting the scope of a git command to a subset of the repository. If you have used much git, you have likely used a pathspec whether you know it or not. For example, in the command git add README.md, the pathspec is README.md. However, it is capable of much more nuance and flexibility.

So, why should you learn about pathspecs? Since it is a part of many commands, these commands become much more powerful with an understanding of pathspecs. With git add, you can add just the files within a single directory. With git diff, you can examine just the changes made to filenames with an extension of .scss. You can git grep all files except for those in the /dist directory.

In addition, pathspecs can help with the writing of more generic git aliases. For example, I have an alias named git todo, which will search all of my repository files for the string 'todo'. However, I would like for this to show all instances of the string, even if they are not within my current working directory. With pathspecs, we will see how this becomes possible.

File or directory

The most straightforward way to use a pathspec is with just a directory and/or filename. For example, with git add you can do the following. ., src/, and README are the respective pathspecs for each command.

git add .      # add CWD (current working directory)
git add src/   # add src/ directory
git add README # add only README directory

You can also add multiple pathspecs to a command:

git add src/ server/ # adds both src/ and server/ directories

Sometimes, you may see a -- preceding the pathspec of a command. This is used to remove any ambiguity of what is the pathspec and what is part of the command.

Wildcards

In addition to files & directories, you can match patterns using *, ?, and []. The * symbol is used as a wildcard and it will match the / in paths — in other words, it will search through subdirectories.

git log '*.js' # logs all .js files in CWD and subdirectories
git log '.*'   # logs all 'hidden' files and directories in CWD
git log '*/.*' # logs all 'hidden' files and directories in subdirectories

The quotes are important, especially when using *! They prevent your shell (such as bash or ZSH) from attempting to expand the wildcards on their own. For example, let’s take a look at how git ls-files will list files with and without the quotes.

# example directory structure
#
# .
# ├── package-lock.json
# ├── package.json
# └── data
#     ├── bar.json
#     ├── baz.json
#     └── foo.json

git ls-files *.json 

# package-lock.json
# package.json

git ls-files '*.json'

# data/bar.json
# data/baz.json
# data/foo.json
# package-lock.json
# package.json

Since the shell is expanding the * in the first command, git ls-files receives the command as git ls-files package-lock.json package.json. The quotes ensure that git is the one to resolve the wildcard.

You can also use the ? character as a wildcard for a single character. For example, to match either mp3 or mp4 files, you can do the following.

git ls-files '*.mp?'

Bracket expressions

You can also use “bracket expressions” to match a single character out of a set. For example, if you’d like to make matches between either TypeScript or JavaScript files, you can use [tj]. This will match either a t or a j.

git ls-files '*.[tj]s'

This will match either .ts files or .js files. In addition to just using characters, there are certain collections of characters that can be referenced within bracket expressions. For example, you can use [:digit:] within a bracket expression to match any decimal digit, or you can use [:space:] to match any space characters.

git ls-files '*.mp[[:digit:]]' # mp0, mp1, mp2, mp3, ..., mp9
git ls-files '*[[:space:]]*' # matches any path containing a space

To read more about bracket expression and how to use them, check out the GNU manual.

Magic signatures

Pathspecs also have the special tool in their arsenal called “magic signatures” which unlock some additional functionality to your pathspecs. These “magic signatures” are called by using :(signature) at the beginning of your pathspec. If this doesn’t make sense, don’t worry: some examples will hopefully help clear it up.

top

The top signature tells git to match the pattern from the root of the git repository rather than the current working directory. You can also use the shorthand :/ rather than :(top).

git ls-files ':(top)*.js'
git ls-files ':/*.js' # shorthand

This will list all files in your repository that have an extension of .js. With the top signature this can be called within any subdirectory in your repository. I find this to be especially useful when writing generic git aliases!

git config --global alias.js 'ls-files -- ':(top)*.js''

You can use git js anywhere within your repository to get a list of all JavaScript files in your project using this.

icase

The icase signature tells git to not care about case when matching. This could be useful if you don’t care which case the filename is — for example, this could be useful for matching jpg files, which sometimes use the uppercase extension JPG.

git ls-files ':(icase)*.jpg'

literal

The literal signature tells git to treat all of your characters literally. This would be used if you want to treat characters such as * and ? as themselves, rather than as wildcards. Unless your repository has filenames with * or ?, I don’t expect that this signature would be used too often.

git log ':(literal)*.js' # returns log for the file '*.js'

glob

When I started learning pathspecs, I noticed that wildcards worked differently than I was used to. Typically I see a single asterisk * as being a wildcard that does not match anything through directories and consecutive asterisks (**) as a “deep” wildcard that does match names through directories. If you would prefer this style of wildcards, you can use the glob magic signature!

This can be useful if you want more fine-grained control over how you search through your project’s directory structure. As an example, take a look at how these two git ls-files can search through a React project.

git ls-files ':(glob)src/components/*/*.jsx' # 'top level' jsx components
git ls-files ':(glob)src/components/**/*.jsx' # 'all' jsx components

attr

Git has the ability to set “attributes” to specific files. You can set these attributes using a .gitattributes file.

# .gitattributes

src/components/vendor/*  vendored # sets 'vendored' attribute
src/styles/vendor/*      vendored

Using the attr magic signature can set attribute requirements for your pathspec. For example, we might want to ignore the above files from a vendor.

git ls-files ':(attr:!vendored)*.js' # searches for non-vendored js files
git ls-files ':(attr:vendored)*.js'  # searches for vendored js files

exclude

Lastly, there is the “exclude’” magic signature (shorthand of :! or :^). This signature works differently from the rest of the magic signatures. After all other pathspecs have been resolved, all pathspecs with an exclude signature are resolved and then removed from the returned paths. For example, you can search through all of your .js files while excluding the .spec.js test files.

git grep 'foo' -- '*.js' ':(exclude)*.spec.js' # search .js files excluding .spec.js
git grep 'foo' -- '*.js' ':!*.spec.js' .       # shorthand for the same

Combining signatures

There is nothing limiting you from using multiple magic signatures in a single pathspec! You can use multiple signatures by separating your magic words with commas within your parenthesis. For example, you can do the following if you’d like to match from the base of your repository (using top), case insensitively (using icase), using only authored code (ignoring vendor files with attr), and using glob-style wildcards (using glob).

git ls-files -- ':(top,icase,glob,attr:!vendored)src/components/*/*.jsx'

The only two magic signatures that you are unable to combine are glob and literal, since they both affect how git deals with wildcards. This is referenced in the git glossary with perhaps my favorite sentence that I have ever read in any documentation.

Glob magic is incompatible with literal magic.


Pathspecs are an integral part of many git commands, but their flexibility is not immediately accessible. By learning how to use wildcards and magic signatures you can multiply your command of the git command line.