Note: If you don’t want to read the exposition and explanations and just want to know the steps I did, scroll to the summary at the bottom.
For a couple of years I have (with varying degrees of commitment) participated in Advent of Code, a yearly programming competition. It consists of fun little daily challenges. It is great to exercise your coding muscles and can provide opportunity to learn new languages and technologies.
So far I have created a separate repository for each year, with a directory per day. But I decided that I’d prefer to have a single repository, containing my solutions for all years. The main reason is that I tend to write little helpers that I would like to re-use between years.
When merging the repositories it was important to me to preserve the history of the individual years as well, though. I googled around for how to do this and the solutions I found didn’t quite work for me. So I thought I should document my own solution, in case anyone finds it useful.
You can see the result here. As you can see, there are four cleanly disjoint branches with separate histories. They then merge into one single commit.
One neat effect of this is that the merged repository functions as a normal
remote for all the four old repositories. It involves no rewrites of history
and all the previous commits are preserved exactly as-is. So you can just git pull
from this new repository and git will fast-forward the branch.
Step 1: Prepare individual repositories
First I went through all repositories and prepared them. I wanted to have the years in individual directories. In theory, it is possible to use git-filter-repo and similar tooling to automate this step. For larger projects this might be worth it.
I found it simpler to manually make the changes in the individual repositories
and commit them. In particular, I did not only need to move the files to the
sub directory, I also had to fix up Go module and import paths. Figuring out
how to automate that seemed like a chore. But doing it manually is a quick and
easy sed
command.
You can see an example of that in this commit. While that link points at the final, merged repository, I created the commit in the old repository. You can see that a lot of files simply moved. But some also had additional changes.
You can also see that I left the go.mod
in the top-level directory. That was
intentional - I want the final repository to share a single module, so that’s
where the go.mod
belongs.
After this I was left with four repositories, each of which had all the
solutions in their own subdirectory, with a go.mod
/go.sum
file with the
shared module path. I tested that all solutions still compile and appeared to
work and moved on.
Step 2: Prepare merged repository
The next step is to create a new repository which can reference commits and objects in all the other repos. After all, it needs to contain the individual histories. This is simple by setting the individual repositories as remotes:
$ mkdir ~/src/github.com/Merovius/AdventOfCode
$ cd ~/src/github.com/Merovius/AdventOfCode
$ git init
$ git remote add 2018 ~/src/github.com/Merovius/aoc18
$ git remote add 2020 ~/src/github.com/Merovius/aoc_2020
$ git remote add 2021 ~/src/github.com/Merovius/aoc_2021
$ git remote add 2022 ~/src/github.com/Merovius/aoc_2022
$ git fetch --multiple 2018 2020 2021 2022
$ git branch -a
remotes/2018/master
remotes/2020/main
remotes/2021/main
remotes/2022/main
One thing worth pointing out is that at this point, the merged AdventOfCode
repository does not have any branches itself. The only existing branches are
remotes/
references. This is relevant because we don’t want our resulting
histories to share any common ancestor. And because git behaves slightly
differently in an empty repository. A lot of commands operate on HEAD
(the
“current branch”), so they have special handling if there is no HEAD
.
Step 3: Create merge commit
A git commit can have an arbitrary number of “parents”:
- If a commit has zero parents, it is the start of the history. This is what
happens if you run
git commit
in a fresh repository. - If a commit has exactly one parent, it is a regular commit. This is what
happens when you run
git commit
normally. - If a parent has more than one parent, it is a merge commit. This is what
happens when you use
git merge
or merge a pull request in the web UI of a git hoster (like GitHub or Gitlab).
Normally merge commits have two parents - one that is the “main” branch and one that is being “merged into”. However, git does not really distinguish between “main” and “merged” branch. And it also allows a branch to have more than two parents.
We want to create a new commit with four parents: The HEAD
s of our four
individual repositories. I expected this to be simple, but:
$ git merge --allow-unrelated-histories remotes/2018/master remotes/2020/main remotes/2021/main remotes/2022/main
fatal: Can merge only exactly one commit into empty head
This command was supposed to create a merge commit with four parents. We have
to pass --allow-unrelated-histories
, as git otherwise tries to find a common
ancestor between the parents and complains if it can’t find any.
But the command is failing. It seems git is unhappy using git merge
with
multiple parents if we do not have any branch yet.
I suspect the intended path at this point would be to check out one of the branches and then merge the others into that. But that creates merge conflicts and it also felt… asymmetric to me. I did not want to give any of the base repositories preference. So instead I opted for a more brute-force approach: Dropping down to the plumbing layer.
First, I created the merged directory structure:
$ cp -r ~/src/github.com/Merovius/aoc18/* .
$ cp -r ~/src/github.com/Merovius/aoc_2020/* .
$ cp -r ~/src/github.com/Merovius/aoc_2021/* .
$ cp -r ~/src/github.com/Merovius/aoc_2022/* .
$ vim go.mod # fix up the merged list of dependencies
$ go mod tidy
$ git add .
Note: The above does not copy hidden files (like .gitignore
). If you do
copy hidden files, take care not to copy any .git
directories.
At this point the working directory contains the complete directory layout for
the merged commit and it is all in the staging area (or “index”). This is where
we normally run git commit
. Instead we do the equivalent steps manually,
allowing us to override the exact contents:
$ TREE=$(git write-tree)
$ COMMIT=$(git commit-tree $TREE \
-p remotes/2018/master \
-p remotes/2020/main \
-p remotes/2021/main \
-p remotes/2022/main \
-m "merge history of all years")
$ git branch main $COMMIT
The write-tree
command takes the content of the index and writes it to a
“Tree Object”
and then returns a reference to the Tree it has written.
A Tree is an immutable representation of a directory in git. It (essentially) contains a list of file name and ID pairs, where each ID points either to a “Blob” (an immutable file) or another Tree.
A Commit in git is just a Tree (describing the state of the files in the repository at that commit), a list of parents, a commit message and some meta data (like who created the commit and when).
The commit-tree
command is a low-level command to create such a Commit
object. We give it the ID of the Tree the Commit should contain and a list of
parents (using -p
) as well as a message (using -m
). It then writes out that
Commit to storage and returns its ID.
At this point we have a well-formed Commit, but it is just loosely stored in the repository. We still need a Branch to point at it, so it doesn’t get lost and we have a memorable handle.
You probably used the git branch
command before. In the form above, it
creates a new branch main
(remember: So far our repository had no branches)
pointing at the Commit we created.
And that’s it. We can now treat the repository as a normal git repo. All that is left is to publish it:
$ git remote add origin [email protected]:Merovius/AdventOfCode
$ git push --set-upstream origin main
Executive Summary
To summarize the steps I did:
- Create commits in each of the old repositories to move files around and fixing anticipated merge conflicts as needed.
- Create a pristine new repository without any branches:
$ git init merged $ cd merged
- Add the old repositories as remotes for the merged repo:
$ git remote add <repo1> /path/to/repo1 $ git fetch repo1 $ git remote add <repo2> /path/to/repo2 $ git fetch repo2 $ # …
- Copy files from old repositories into merged repo:
$ cp -r /path/to/repo1/* . $ cp -r /path/to/repo2/* . $ # …
- Create commit using plumbing commands:
$ git add . $ TREE=$(git write-tree) $ COMMIT=$(git commit-tree $TREE \ -m "merge repositories" \ -p remotes/repo1/main \ -p remotes/repo2/main) $ git branch main $COMMIT