Understanding and Working with Submodules in Git

Share this article

Understanding and Working with Submodules in Git

Most modern software projects depend on the work of others. It would be a waste of time to reinvent the wheel in your own code when someone else has already written a wonderful solution. That’s why so many projects use third-party code in the form of libraries or modules.

Git, the world’s most popular version control system, offers a great way to manage these dependencies in an elegant, robust way. Its “submodule” concept allows us to include and manage third-party libraries while keeping them cleanly separated from our own code.

In this article, you’ll learn why submodules in Git are so useful, what they actually are, and how they work.

Keeping Code Separate

To make clear why Git’s submodules are indeed an invaluable structure, let’s look at a case without submodules. When you need to include third-party code (such as an open-source library) you can of course go the easy way: just download the code from GitHub and dump it somewhere into your project. While certainly quick, this approach is definitely dirty for a couple of reasons:

  • By brute force copying third-party code into your project, you’re effectively mixing multiple projects into one. The line between your own project and that of someone else (the library) starts to get blurry.
  • Whenever you need to update the library code (because its maintainer delivered a great new feature or fixed a nasty bug) you again have to download, copy, and paste. This quickly becomes a tedious process.

The general rule in software development to “keep separate things separate” exists for a reason. And it’s certainly true for managing third-party code in your own projects. Luckily, Git’s submodule concept was made for exactly these situations.

But of course, submodules aren’t the only available solution for this kind of problem. You could also use one of the various “package manager” systems that many modern languages and frameworks provide. And there’s nothing wrong about that!

However, you could argue that Git’s submodule architecture comes with a couple of advantages:

  • Submodules provide a consistent, reliable interface — no matter what language or framework you’re using. Especially if you’re working with multiple technologies, each one might have its own package manager with its own set of rules and commands. Submodules, on the other hand, always work the same.
  • Not every piece of code might be available over a package manager. Maybe you just want to share your own code between two projects — a situation where submodules might offer the simplest possible workflow.

What Git Submodules Really Are

Submodules in Git are really just standard Git repositories. No fancy innovation, just the same Git repositories that we all know so well by now. This is also part of the power of submodules: they’re so robust and straightforward because they are so “boring” (from a technological point of view) and field-tested.

The only thing that makes a Git repository a submodule is that it’s placed inside another, parent Git repository.

Other than that, a Git submodule remains a fully functional repository: you can perform all the actions that you already know from your “normal” Git work — from modifying files, all the way to committing, pulling and pushing. Everything’s possible in a submodule.

Adding a Submodule

Let’s take the classic example and say we’d like to add a third-party library to our project. Before we go get any code, it makes sense to create a separate folder where things like these can have a home:

$ mkdir lib
$ cd lib

Now we’re ready to pump some third-party code into our project — but in an orderly fashion, using submodules. Let’s say we need a little “timezone converter” JavaScript library:

$ git submodule add https://github.com/spencermountain/spacetime.git

When we run this command, Git starts cloning the repository into our project, as a submodule:

Cloning into 'carparts-website/lib/spacetime'...
remote: Enumerating objects: 7768, done.
remote: Counting objects: 100% (1066/1066), done.
remote: Compressing objects: 100% (445/445), done.
remote: Total 7768 (delta 615), reused 975 (delta 588), pack-reused 6702
Receiving objects: 100% (7768/7768), 4.02 MiB | 7.78 MiB/s, done.
Resolving deltas: 100% (5159/5159), done.

And if we take a look at our working copy folder, we can see that the library files have in fact arrived in our project.

Our library files are here, included in a submodule

“So what’s the difference?” you might ask. After all, the third-party library’s files are here, just like they would be if we had copy-pasted them. The crucial difference is indeed that they are contained in their own Git repository! Had we just downloaded some files, thrown them into our project and then committed them — like the other files in our project — they would have been part of the same Git repository. The submodule, however, makes sure that the library files don’t “leak” into our main project’s repository.

Let’s see what else has happened: a new .gitmodules file has been created in the root folder of our main project. Here’s what it contains:

[submodule "lib/spacetime"]
  path = lib/spacetime
  url = https://github.com/spencermountain/spacetime.git

This .gitmodules file is one of multiple places where Git keeps track of the submodules in our project. Another one is .git/config, which now ends like this:

[submodule "lib/spacetime"]
  url = https://github.com/spencermountain/spacetime.git
  active = true

And finally, Git also keeps a copy of each submodule’s .git repository in an internal .git/modules folder.

All of these are technical details you don’t have to remember. However, it probably helps you to understand that the internal maintenance of Git submodules is quite complex. That’s why it’s important to take one thing away: don’t mess with Git submodule configuration by hand! If you want to move, delete, or otherwise manipulate a submodule, please do yourself a favor and do not try this manually. Either use the proper Git commands or a desktop GUI for Git like “Tower”, which takes care of these details for you.

Git desktop GUIs like Tower make handling Git submodules easier

Let’s have a look at the status of our main project, now that we’ve added the submodule:

$ git status
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
  new file:   .gitmodules
  new file:   lib/spacetime

As you can see, Git regards adding a submodule as a change like any other. Accordingly, we have to commit this change like any other:

$ git commit -m "Add timezone converter library as a submodule"

Cloning a Project with Git Submodules

In our example above, we added a new submodule to an existing Git repository. But what about “the other way around”, when you clone a repository that already contains submodules?

If we performed a vanilla git clone <remote-URL> on the command line, we would download the main project — but we would find any submodule folder empty! This again is vivid proof that submodule files are separate and not included in their parent repositories.

In such a case, to populate submodules after you’ve cloned their parent repository, you can simply execute git submodule update --init --recursive afterwards. The even better way is to simply add the --recurse-submodules option right when you call git clone in the first place.

Checking Out Revisions

In a “normal” Git repository, we usually check out branches. By using git checkout <branchname> or the newer git switch <branchname>, we’re telling Git what our currently active branch should be. When new commits are made on this branch, the HEAD pointer is automatically moved to the very latest commit. This is important to understand — because Git submodules work differently!

In a submodule, we’re always checking out a specific revision — not a branch! Even when you’re executing a command like git checkout main in a submodule, in the background, the currently latest commit on that branch is noted — not the branch itself.

This behavior, of course, is not a mistake. Think about it: when you include a third-party library, you want to have complete control over what exact code is being used in your main project. When the library’s maintainer releases a new version, that’s all well and good … but you don’t necessarily want this new version to be automatically used in your project. Simply because you don’t know if those new changes might break your project!

If you want to find out what revision your submodules are using, you can request this information in your main project:

$ git submodule status
   ea703a7d557efd90ccae894db96368d750be93b6 lib/spacetime (6.16.3)

This returns the currently checked out revision of our lib/spacetime submodule. And it also lets us know that this revision is a tag, named “6.16.3”. It’s pretty common to use tags heavily when working with submodules in Git.

Let’s say you wanted your submodule to use an older version, which was tagged “6.14.0”. First, we have to change directories so that our Git command will be executed in the context of the submodule, not our main project. Then, we can simply run git checkout with the tag name:

$ cd lib/spacetime/
$ git checkout 6.14.0
Previous HEAD position was ea703a7 Merge pull request #301 from spencermountain/dev
HEAD is now at 7f78d50 Merge pull request #268 from spencermountain/dev

If we now go back into our main project and execute git submodule status again, we’ll see our checkout reflected:

$ cd ../..
$ git submodule status
+7f78d50156ae1205aa50675ddede81a61a45fade lib/spacetime (6.14.0)

Take a close look at the output, though: the little + symbol in front of that SHA-1 hash tells us that the submodule is at a different revision than is currently stored in the parent repository. As we just changed the checked out revision, this looks correct.

Calling git status in our main project now informs us about this fact, too:

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
  modified:   lib/spacetime (new commits)

You can see that Git considers moving a submodule’s pointer as a change like any other: we have to commit it to the repository if we want it to be stored:

$ git commit -m "Changed checked out revision in submodule"
$ git push

Updating a Git Submodule

In the above steps, it was us who moved the submodule pointer: we were the ones who chose to check out a different revision, commit it, and push it to our team’s remote repository. But what if one of our colleagues changed the submodule revision — maybe because an interesting new version of the submodule was released and our colleague decided to use this in our project (after thoroughly testing, of course …).

Let’s do a simple git pull in our main project — as we would probably do quite often anyway — to get new changes from the shared remote repository:

$ git pull
From https://github.com/gntr/git-crash-course
   d86f6e0..055333e  main       -> origin/main
Updating d86f6e0..055333e
Fast-forward
   lib/spacetime | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

The second to last line indicates that something in our submodule has been changed. But let’s take a closer look:

$ git submodule status
+7f78d50156ae1205aa50675ddede81a61a45fade lib/spacetime (6.14.0)

I’m sure you remember that little + sign: it means the submodule pointer was moved! To update our locally checked out revision to the “official” one that our teammate chose, we can run the update command:

$ git submodule update lib/spacetime 
Submodule path 'lib/spacetime': checked out '5e3d70a88180879ae0222b6929551c41c3e5309e'

Alright! Our submodule is now checked out at the revision that’s recorded in our main project repository!

Working with Submodules in Git

We’ve covered the basic building blocks of working with Git submodules. Other workflows are really quite standard!

Checking for new changes in a submodule, for example, works like in any other Git repository: you run a git fetch command inside the submodule repository, possibly followed by something like git pull origin main if you want to indeed make use of the updates.

Making changes in a submodule might also be a use case for you, especially if you manage the library code yourself (because it’s an internal library, not from a third party). You can work with the submodule like with any other Git repository: you can make changes, commit them, push them, and so on.

Using the Full Power of Git

Git has a whole lot of power under the hood. But many of its advanced tools — like Git submodules — aren’t well known. It’s really a pity that so many developers are missing out on a lot of powerful stuff!

If you want to go deeper and get a glimpse of some other advanced Git techniques, I highly recommend the “Advanced Git Kit“: it’s a (free!) collection of short videos that introduce you to topics like the Reflog, Interactive Rebase, Cherry-Picking, and even branching strategies.

Have fun becoming a better developer!

FAQs about Git Submodules

What is a Git submodule?

A Git submodule is a way to include another Git repository as a subdirectory within your own Git repository. It allows you to maintain a separate repository as a subproject within your main project.

Why use Git submodules?

Git submodules are useful for incorporating external repositories into your project, especially when you want to keep their development history separate from your main project. This is beneficial for managing dependencies or including external libraries.

What information is stored in the main project regarding submodules?

The main project stores the submodule’s URL and commit hash in a special entry in the parent repository. This allows anyone cloning the main project to also clone the referenced submodules.

How do I clone a Git repository with submodules?

When cloning a repository with submodules, you can use the --recursive flag with the git clone command to automatically initialize and clone the submodules. Alternatively, you can use git submodule update --init after cloning.

Can I have nested submodules?

Yes, Git supports nested submodules, meaning a submodule can contain its own submodules. However, managing nested submodules can become complex, and it’s essential to ensure that each submodule is correctly initialized and updated.

Tobias GüntherTobias Günther
View Author

Tobias Günther is a co-founder of “Tower”, the popular Git desktop client that helps more than 100,000 developers around the world to be more productive with Git.

gitsubmodules
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week