0

I have this API documentation of the website http://json-homework.task-sss.krasilnikov.spb.ru/docs/9f66a575a6cfaaf7e43177317461d057 (which is only in Russian, unfortunately, but I'll try to explain), and I am to import the data about the group members from there, but the issue is that parameter page returns only 5 members, and when you increase the page number, it only returns next 5 members, not adding them to the previous five. Here is my code:

import pandas as pd
import requests as rq
import json
from pandas.io.json import json_normalize
url='http://json-homework.task-sss.krasilnikov.spb.ru/api/groups/getmembers?api_key=9f66a575a6cfaaf7e43177317461d057&group_id=4508123&page=1'
data=rq.get(url)
data1=json.loads(data.text)
data1=json_normalize(json.loads(data.text)["response"])
data1

and here is what my output looks like: my output

By entering bigger and bigger numbers, I also found out that the last part of data exists on 41 page, i.e. I need to get the data from 1 to 41 page. How can I include all the pages in my code? Maybe it is possible with some loop or something like that, I don't know...

1 Answer 1

2

According to the API documentation, there is no parameter to specify the users to fetch in one page, so you will have to get them 5 at a time, and since there are 41 pages you can just loop through the urls.

import requests as rq
import json

all_users = []
for page in range(1,42):
    url=f'http://json-homework.task-sss.krasilnikov.spb.ru/api/groups/getmembers?api_key=9f66a575a6cfaaf7e43177317461d057&group_id=4508123&page={page}'
    data=rq.get(url)
    all_users.append(json.loads(data.text)["response"])

The above implementation, will of course not check for any api throttling i.e. the API may give unexpected data if too many requests are made in a very short duration, which you can mitigate using some well placed delays.

Sign up to request clarification or add additional context in comments.

7 Comments

I get this error now "JSONDecodeError: Expecting value: line 1 column 1 (char 0)"...
replace the last line in my code with this all_users = all_users + json.loads(data.text)["response"], then you can run all_users through json_normalize
oh, it still gives me this error on this line all_users = all_users + json.loads(data.text)
I ran the above code and it works fine on my end. Can you print data.text within the loop and check if you are getting valid json as a string? also what python version are you using?
here is my output of json_normalize(all_users) ``` id first_name last_name 0 212442289 Darya Glazova 1 303958499 Alexey Kim 2 51058438 Masha Isaeva 3 97510651 Pavel Kuznitsyn 4 175877310 Irina Prokopyeva .. ... ... ... 198 263149712 Fyodor Osipov 199 169412107 Maria Frank 200 68937678 Denis Yurchenko 201 172735591 Sasha Zavyalova 202 375195548 Sofia Tarasova [203 rows x 3 columns] ```
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.