StackOverflow API returns only 20 questions

Question

The following code returns only 20 questions/results. How can I retrieve the whole number of questions for that date?

base_url = 'https://api.stackexchange.com/2.3'
endpoint = '/questions'
params = {
    'site': 'stackoverflow',
    'tagged': tag,
    'fromdate': start_date,
    'todate': end_date,
    'filter': 'default'  # Use 'default' or specify your desired filter
}

response = requests.get(base_url + endpoint, params=params)
data = response.json()

thank you - can you please elaborate? I looked at the page, but I can't figure out how to tackle the problem. Many thanks! — Dimitris
– Dimitris, Commented Jun 8, 2023 at 6:48
What is the problem? All APIs use paging for scalability reasons. Returning a ton of results means a lot of RAM and a lot of IO is used, while the database remains locked for a long time. All APIs use some kind of paging instead — Panagiotis Kanavos
– Panagiotis Kanavos, Commented Jun 8, 2023 at 7:14
"I looked at the page, but I can't figure out how to tackle the problem". To be clear: the problem is that when you use the API, you only get 20 results at a time, right? See how it shows you that there is a pagesize parameter you can use when you make the API query? What happens if you try using different values for that? Do you see how that corresponds to the number of results you get? (What do you suppose a "page" might refer to, in this context? What would it mean if the "size" of the "page" changes?) — Karl Knechtel
– Karl Knechtel, Commented Jun 9, 2023 at 3:59
every query to the API returns a 'has_more' boolean, so that one can increase the page and get the next batch of results. the question has been answered. — Dimitris
– Dimitris, Commented Jun 9, 2023 at 9:36

safir · Accepted Answer · 2023-06-10 21:23:37Z

0

according to page doc pagesize can be any value between 0 and 100 and defaults to 30. if with the default values you only get 20 questions, it's probably because there are only this many questions fitting your tag in the time span given (can't tell as it's not included) otherwise you would get 30 results and would need to paginate through the different pages of results with page param like so

base_url = 'https://api.stackexchange.com/2.3'
endpoint = '/questions'
page = 1
pagesize = 30
page_results = []
while (page == 1 or page_results["has_more"] == True) :
  params = {
    'site': 'stackoverflow',
    'page': page,
    'pagesize' : pagesize,
    'tagged': tag,
    'fromdate': start_date,
    'todate': end_date,
    'filter': 'default'  # Use 'default' or specify your desired filter
  }
  page_results = requests.get(base_url + endpoint, params=params).json()
  page +=1

edited Jun 10, 2023 at 21:23

answered Jun 8, 2023 at 7:02

safir

1167 bronze badges

Sign up to request clarification or add additional context in comments.

16 Comments

blhsing Over a year ago

There are no such operators as || and ++ in Python.

Panagiotis Kanavos Over a year ago

@Dimitris you can't get all questions, in any API, not just StackOverflow's. Unless the API's creator knows only a few items will be returned, everyone implements some kind of paging.

Panagiotis Kanavos Over a year ago

No, what you tried to do would be very clunky. It would prevent you from asking this question because the server would be frozen trying to return then 1M questions per day

Panagiotis Kanavos Over a year ago

All the questions for all time, one day at a time, is still all questions, so you'll end up downloading everything. If you check the actual data dump you'll see the actual English SO Posts file is 18GB zipped. You'll probably download that file faster than trying to retrieve the same contents through the API. Download tools can easily recover from network problems, download in parallel or retry in chunks

Timus Over a year ago

@safir The len(page_results) == pagesize isn't the best approach: page_results["has_more"] gives you a True if there are more items (and False otherwise).

|

Collectives™ on Stack Overflow

StackOverflow API returns only 20 questions

1 Answer 1

16 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

16 Comments

Your Answer

Sign up or log in

Post as a guest

Related