How to get data from all pages in Github API with Python?

Question

I'm trying to export a repo list and it always returns me information about the 1rst page. I could extend the number of items per page using URL+"?per_page=100" but it's not enough to get the whole list. I need to know how can I get the list extracting data from page 1, 2,...,N. I'm using Requests module, like this:

while i <= 2:
      r = requests.get('https://api.github.com/orgs/xxxxxxx/repos?page{0}&per_page=100'.format(i), auth=('My_user', 'My_passwd'))
      repo = r.json()
      j = 0
      while j < len(repo):
            print repo[j][u'full_name']
            j = j+1
      i = i + 1

I use that while condition 'cause I know there are 2 pages, and I try to increase it in that waym but It doesn't work

print the url generated in each iteration and check whether it is correct or not — Saeid
– Saeid, Commented Nov 23, 2015 at 18:49
You have the line: repo=p.json() Is this a typo? Should it read r.json()? — Jeff Mandell
– Jeff Mandell, Commented Nov 23, 2015 at 18:51

Pat Myron · Accepted Answer · 2025-01-15 20:53:16Z

47

import requests

url = "https://api.github.com/XXXX?simple=yes&per_page=100&page=1"
res=requests.get(url,headers={"Authorization": git_token})
repos=res.json()
while 'next' in res.links:
  res=requests.get(res.links['next']['url'],headers={"Authorization": git_token})
  repos.extend(res.json())

If you aren't making a full blown app use a "Personal Access Token"

https://github.com/settings/tokens

edited Jan 15 at 20:53

Pat Myron

4,6764 gold badges30 silver badges52 bronze badges

answered Jun 7, 2018 at 0:00

gotit

7276 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hyperlink Over a year ago

Thanks! I didn't know there was a response.links attribute. This answer needs to be selected as best answer. Every other solution is a hack!

cdvv7788 · Accepted Answer · 2015-11-23 19:55:56Z

7

From github docs:

Response:

Status: 200 OK
Link: <https://api.github.com/resource?page=2>; rel="next",
      <https://api.github.com/resource?page=5>; rel="last"
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999

You get the links to the next and the last page of that organization. Just check the headers.

On Python Requests, you can access your headers with:

response.headers

It is a dictionary containing the response headers. If link is present, then there are more pages and it will contain related information. It is recommended to traverse using those links instead of building your own.

You can try something like this:

import requests
url = 'https://api.github.com/orgs/xxxxxxx/repos?page{0}&per_page=100'
response = requests.get(url)
link = response.headers.get('link', None)
if link is not None:
    print link

If link is not None it will be a string containing the relevant links for your resource.

edited Nov 23, 2015 at 19:55

answered Nov 23, 2015 at 18:58

cdvv7788

2,0971 gold badge19 silver badges26 bronze badges

11 Comments

skiel95 Over a year ago

Thank you for your answer, but I've already found that on github docs and I have no idea about how it works. I couldn't find the way to implement it using python. Can you help me please?

cdvv7788 Over a year ago

Yeah, i am checking that at the moment. Do you have an org with a lot of pages?

skiel95 Over a year ago

I'm a beginner using APIs and Python, I really appreciate your help. I've an organization with 44 users and 190 repos, so it's imposible to download them all in one shot beacuse of the max is 100 items.

cdvv7788 Over a year ago

Just check if the headers dict contains the key link, in which case you can find the next one to request and do that until it doesn't show up.

skiel95 Over a year ago

I don't know how to do it. Can I contact you in order to learn something? I think I will be really quick by chatting

|

MikeJ · Accepted Answer · 2018-05-16 15:15:24Z

2

From my understanding, link will be None if only a single page of data is returned, otherwise link will be present even when going beyond the last page. In this case link will contain previous and first links.

Here is some sample python which aims to simply return the link for the next page, and returns None if there is no next page. So could incorporate in a loop.

link = r.headers['link']
if link is None:
    return None

# Should be a comma separated string of links
links = link.split(',')

for link in links:
    # If there is a 'next' link return the URL between the angle brackets, or None
    if 'rel="next"' in link:
        return link[link.find("<")+1:link.find(">")]
return None

answered May 16, 2018 at 15:15

MikeJ

2,4473 gold badges22 silver badges23 bronze badges

1 Comment

akahunahi Over a year ago

Helps if the accepted answer was tested better. This solves the two different scenarios.

Silverfibre · Accepted Answer · 2018-10-19 06:16:53Z

2

Extending on the answers above, here is a recursive function to deal with the GitHub pagination that will iterate through all pages, concatenating the list with each recursive call and finally returning the complete list when there are no more pages to retrieve, unless the optional failsafe returns the list when there are more than 500 items.

import requests

api_get_users = 'https://api.github.com/users'


def call_api(apicall, **kwargs):

    data = kwargs.get('page', [])

    resp = requests.get(apicall)
    data += resp.json()

    # failsafe
    if len(data) > 500:
        return (data)

    if 'next' in resp.links.keys():
        return (call_api(resp.links['next']['url'], page=data))

    return (data)


data = call_api(api_get_users)

edited Oct 19, 2018 at 6:16

answered Oct 17, 2018 at 9:34

Silverfibre

413 bronze badges

1 Comment

Grajdeanu Alex Over a year ago

That would be more nice if it'd be adapted to something like: data = call_api(...); while data: # do something with the data; data = call_api(...). Basically to have some sort of iterator which allows you to retrieve data in batches, one page at a time :)

Tejas Rahate · Accepted Answer · 2019-04-10 04:59:13Z

First you use

print(a.headers.get('link'))

this will give you the number of pages the repository has, similar to below

<https://api.github.com/organizations/xxxx/repos?page=2&type=all>; rel="next", 

<https://api.github.com/organizations/xxxx/repos?page=8&type=all>; rel="last"

from this you can see that currently we are on first page of repo, rel='next' says that the next page is 2, and rel='last' tells us that your last page is 8.

After knowing the number of pages to traverse through,you just need to use '=' for page number while getting request and change the while loop until the last page number, not len(repo) as it will return you 100 each time. for e.g

i=1
while i <= 8:
      r = requests.get('https://api.github.com/orgs/xxxx/repos?page={0}&type=all'.format(i),
                         auth=('My_user', 'My_passwd'))
      repo = r.json()
      for j in repo:
        print(repo[j][u'full_name'])
      i = i + 1

Nikos Kostoulas · Accepted Answer · 2019-04-17 19:20:57Z

0

        link = res.headers.get('link', None)

        if link is not None:
            link_next = [l for l in link.split(',') if 'rel="next"' in l]
            if len(link_next) > 0:
                return int(link_next[0][link_next[0].find("page=")+5:link_next[0].find(">")])

answered Apr 17, 2019 at 19:20

Nikos Kostoulas

457 bronze badges

Collectives™ on Stack Overflow

How to get data from all pages in Github API with Python?

6 Answers 6

1 Comment

11 Comments

1 Comment

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

11 Comments

1 Comment

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related