1

I want to iterate and extract tables from the link here, then concatenate or append them to save as a dataframe.

I have used a loop iterate tables but I'm not sure how can I append all json or dataframe into one?

Anyone could help? Thank you.

from requests import post
import json
import pandas as pd
import numpy as np

headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
        "Referer": "http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp"}
dfs = []
#dfs = pd.DataFrame()

for page in range(0, 5):
    data = {"limit": 100, "offset": page * 100, "pageNumber": page + 1}
    json_arr = requests.post("http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json", headers = headers, data = data).text
    d = json.loads(json_arr)
    df = pd.read_json(json.dumps(d['rows']) , orient='list')

Reference related: Iterate and extract tables from web saving as excel file in Python

1 Answer 1

2

Use concat,

import requests
import json
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
    'Referer': 'http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp'
}

dfs = pd.DataFrame()

for page in range(0, 5):
    data = {'limit': 100, 'offset': page * 100, 'pageNumber': page + 1}
    json_arr = requests.post(
        'http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json', 
        headers=headers, 
        data=data).text
    d = json.loads(json_arr)
    df = pd.read_json(json.dumps(d['rows']) , orient='list')
    dfs = pd.concat([df, dfs], sort=False)

Or,

import requests
import json
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
    'Referer': 'http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp'
}

dfs = []

for page in range(0, 5):
    data = {'limit': 100, 'offset': page * 100, 'pageNumber': page + 1}
    json_arr = requests.post(
        'http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json', 
        headers=headers, 
        data=data).text
    d = json.loads(json_arr)
    dfs.append(pd.read_json(json.dumps(d['rows']) , orient='list'))

df = pd.concat(dfs, sort=False)

PS: The second block is much preferred as you should never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying. Thanks @parfait!

Sign up to request clarification or add additional context in comments.

4 Comments

TypeError: cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid
I think you forgot the expression dfs = []. Use dfs = pd.DataFrame() instead of list
That's absolutely true. If you allow me, I would like to add my answer. @Parfait

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.