Github API: Retrieve all commits for all branches for a repo

github-api

According to the V2 documentation, you can list all commits for a branch with:

commits/list/:user_id/:repository/:branch

I am not seeing the same functionality in the V3 documentation.

I would like to collect all branches using something like:

https://api.github.com/repos/:user/:repo/branches

And then iterate through them, pulling all commits for each. Alternatively, if there's a way to pull all commits for all branches for a repo directly, that would work just as well if not better. Any ideas?

UPDATE: I tried passing the branch :sha as a param as follows:

params = {:page => 1, :per_page => 100, :sha => b}

The problem is that when i do this, it doesn't page the results properly. I feel like we're approaching this incorrectly. Any thoughts?

Best Solution

I have encountered the exact same problem. I did manage to acquire all the commits for all branches within a repository (probably not that efficient due to the API).

Approach to retrieve all commits for all branches in a repository

As you mentioned, first you gather all the branches:

# https://api.github.com/repos/:user/:repo/branches
https://api.github.com/repos/twitter/bootstrap/branches

The key that you are missing is that APIv3 for getting commits operates using a reference commit (the parameter for the API call to list commits on a repository sha). So you need to make sure when you collect the branches that you also pick up their latest sha:

Trimmed result of branch API call for twitter/bootstrap

[
  {
    "commit": {
      "url": "https://api.github.com/repos/twitter/bootstrap/commits/8b19016c3bec59acb74d95a50efce70af2117382",
      "sha": "8b19016c3bec59acb74d95a50efce70af2117382"
    },
    "name": "gh-pages"
  },
  {
    "commit": {
      "url": "https://api.github.com/repos/twitter/bootstrap/commits/d335adf644b213a5ebc9cee3f37f781ad55194ef",
      "sha": "d335adf644b213a5ebc9cee3f37f781ad55194ef"
    },
    "name": "master"
  }
]

Working with last commit's sha

So as we see the two branches here have different sha, these are the latest commit sha on those branches. What you can do now is to iterate through each branch from their latest sha:

# With sha parameter of the branch's lastest sha
# https://api.github.com/repos/:user/:repo/commits
https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=d335adf644b213a5ebc9cee3f37f781ad55194ef

So the above API call will list the last 100 commits of the master branch of twitter/bootstrap. Working with the API you have to specify the next commit's sha to get the next 100 commits. We can use the last commit's sha (which is 7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa using the current example) as input for the next API call:

# Next API call for commits (use the last commit's sha)
# https://api.github.com/repos/:user/:repo/commits
https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa

This process is repeated until the last commit's sha is the same as the API's call sha parameter.

Next branch

That is it for one branch. Now you apply the same approach for the other branch (work from the latest sha).


There is a large issue with this approach... Since branches share some identical commits you will see the same commits over-and-over again as you move to another branch.

I can image that there is a much more efficient way to accomplish this, yet this worked for me.

Related Question