Git – Remove empty commits in git


I just migrated a project from Mercurial to Git. Mercurial adds empty commits when you add tags, so I ended up with empty commits in Git that I would like to remove.

How do I remove empty commits (commits that do not have any files in them) from Git?


Best Solution

One simple (but slow) way to do this is with git filter-branch and --prune-empty. With no other filters, no other commits will be altered, but any empty ones will be discarded (which will cause all subsequent commits to have new parent-IDs and is therefore still "rewrites history": not a big deal if this is your initial import from hg to git, but is a big deal if others are using this repository already).

Note all the usual caveats with filter-branch. (Also, as a side note, an "empty commit" is really one that has the same tree as the previous commit: it's not that it has no files at all, it's that it has all the same files, with the same modes, and the same contents, as its parent commit. This is because git stores complete snapshots for each commit, not differences from one commit to the next.)

Here is a tiny example that hides a lot of places you can do fancier things:

$ ... create repository ...
$ cd some-tmp-dir; git clone --mirror file://path-to-original

(This first clone step is for safety: filter-branch can be quite destructive so it's always good to start it on a new clone rather than the original. Using --mirror means all its branches and tags are mirrored in this copy, too.)

$ git filter-branch --prune-empty --tag-name-filter cat -- --all

(Now wait a potentially very long time; see documentation on methods to speed this up a bit.)

$ git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

(This step is copied out of the documentation: it discards all the original branches from before running the filter. In other words, it makes the filtering permanent. Since this is a clone, that's reasonably safe to do even if the filter went wrong.)

$ cd third-tmp-dir; git clone --mirror file://path-to-filtered-tmp

(This makes a clean, nicely-compressed copy of the filtered copy, with no leftover objects from the filtering steps.)

$ git log     # and other commands to inspect

Once you're sure the clone of the filtered clone is good, you don't need the filtered clone, and can use the last clone in place of the first. (That is, you can replace the "true original" with the final clone.)