Red Green Repeat Adventures of a Spec Driven Junkie

TIL: git and github diff Differently

My team switched over to the SkullCandy git workflow last spring and we did not make a new develop branch for a long time as deleting the branch on github automatically deletes the branch of any open pull requests as well.

Egyptian Band

source

So, this week we ripped the band-aid off and remastered develop.

It’s been painful.

I was hoping pull request from the develop branch into the master branch would tell us the commits on develop that are not in master, so we can sort out the differences.

That pull request did not tell us anything. In fact, it revealed a disturbing fact: changes that I thought were in both branches were not there. How is that so??

I ran experiments to see what’s going on. You can see it here.

Replication

This is what I replicated on the repository, which is the workflow used for SkullCandy:

  • start a master branch.
  • create a develop branch by cloning the master branch.
  • when starting a new feature, clone off the develop branch.
  • when ready to merge change into develop, make a pull request in.
  • after change is in develop and validated, cherry-pick the commit from the branch into master and make a new pull request.
  • done.

After experiments in different merge strategies (merge commit, squash commit, rebase commit), I started to notice: on github, changes that were on the master branch would ONLY be the same if and only if the commit SHA for the change matched.

When I checked locally the difference between master and the corresponding develop and feature branch.

Example: develop3 and master

Let’s go through an example from the repository:

The master branch has all the work and it’s file contents are:

start of work stuff
work stuff 1
work stuff 2
work stuff 3
work stuff 4
work2 changes
more work2 changes
work3 stuff
more work3 stuff

The branch which also has the same work: develop3 has the same file and its contents are :

start of work stuff
work stuff 1
work stuff 2
work stuff 3
work stuff 4
work2 changes
more work2 changes
work3 stuff
more work3 stuff

Locally

Doing a git diff on the command line produces

vagrant@ubuntu-xenial:/vagrant$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
vagrant@ubuntu-xenial:/vagrant$ git diff develop3
vagrant@ubuntu-xenial:/vagrant$

On github

When making a Pull Request on github.com, the result is:

Develop3 to Master PR on Github

diff --git a/work_file.txt b/work_file.txt
index bd6764b..9e7d796 100644
--- a/work_file.txt
+++ b/work_file.txt
@@ -5,3 +5,5 @@ work stuff 3
 work stuff 4
 work2 changes
 more work2 changes
+work3 stuff
+more work3 stuff

which is pretty much as if the work never existed, but is there!

https://github.com/a-leung/commit_tests/compare/master…develop3?expand=1

Why does this matter?

It’s important because there are differences between git and github. I can’t trust github to be consistent with git, even for a simple change if the SHA do not match.

git can resolve the same code appearing with different SHA, github relies on the SHA to compute differences between branches.

The reason for the difference? git computes the difference between branches using diff, github computes the differences between branches using SHA.

The only difference between the branches master and develop3 is the SHA values for the change:

On master branch:

vagrant@ubuntu-xenial:/vagrant$ git blame -s work_file.txt
fabcea4b 1) start of work stuff
fabcea4b 2) work stuff 1
fabcea4b 3) work stuff 2
c92a36c5 4) work stuff 3
c92a36c5 5) work stuff 4
e492f5f3 6) work2 changes
e492f5f3 7) more work2 changes
de94346e 8) work3 stuff
de94346e 9) more work3 stuff

On develop3 branch:

vagrant@ubuntu-xenial:/vagrant$ git blame -s work_file.txt
fabcea4b 1) start of work stuff
fabcea4b 2) work stuff 1
fabcea4b 3) work stuff 2
c92a36c5 4) work stuff 3
c92a36c5 5) work stuff 4
e492f5f3 6) work2 changes
e492f5f3 7) more work2 changes
88e6fff2 8) work3 stuff
88e6fff2 9) more work3 stuff

So, that’s one area git and github differ!

Lesson Learned

We have to adjust our workflow for the ways git and github treats differences in code. It’s a subtle difference, but with greater consequences in that we cannot use the tooling to help us, which adds work (that is not value add!)

For now, I will be remastering the develop branch with higher frequency.