Python: Saving AJAX response data to .json and save this to pandas DataFrame

Question

Hello and thank your for taking the time to have a read at this,

I am looking to extract company information from a particular stock exchange and then save this information to a pandas DataFrame. Each firm has it's own webpage that are all determined by the "KodeEmiten" ending. These codes are saved in a column of the first Dataframe:
df = pd.DataFrame.from_dict(data['data'])

Now my goal is to use these codes to call each companies website individually and create a json file for each

for i in range (len(df)): 
 requests.get(f'https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten={df.loc[i, "KodeEmiten"]}').json()

While this works i can't save this to a new DataFrame due list index out of range and incorrect keyword errors. There is significantly more information in the xhr than i actually need and the different structures are what I believe to cause the error trying to save them to a new DataFrame. I'm really just interested in getting the data in these xhr headers:
AnakPerusahaan:, Direktur:, Komisaris, PemegangSaham:

So my question is kind of two-in-one:
a) How can I just extract the information from those specific xhr headers (all of them are tables)
b) how can i save those to a new dataframe (or even list I don't really mind)

import requests
import pandas as pd
import json
import time

# gets broad data of main page of the stock exchange
sxow = requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfiles?draw=1&columns%5B0%5D%5Bdata%5D=KodeEmiten&columns%5B0%5D%5Bname%5D&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=KodeEmiten&columns%5B1%5D%5Bname%5D&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=NamaEmiten&columns%5B2%5D%5Bname%5D&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=false&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=TanggalPencatatan&columns%5B3%5D%5Bname%5D&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=false&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&start=0&length=700&search%5Bvalue%5D&search%5Bregex%5D=false&_=155082600847')

data = sxow.json() # save the request as .json file
df = pd.DataFrame.from_dict(data['data']) #creates DataFrame based on the data (.json) file


# add: compare file contents and overwrite original if same

cdate = time.strftime ("%Y%m%d") # creating string-variable w/ current date year|month|day
df.to_excel(f"{cdate}StockExchange_Overview.xlsx") # converts DataFrame to Excel file, can't overwrite existing file


for i in range (len(df)) :
    requests.get(f'https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten={df.loc[i, "KodeEmiten"]}').json()

#This is where I'm completely stuck

bigbounty · Accepted Answer · 2019-02-26 05:10:01Z

1

You don't need to convert the result to a dataframe. You can just loop through the json object and concatenate the url to get other companies website details.

Follow the code below:

import requests
import pandas as pd
import json
import time

# gets broad data of main page of the stock exchange
sxow = requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfiles?draw=1&columns%5B0%5D%5Bdata%5D=KodeEmiten&columns%5B0%5D%5Bname%5D&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=KodeEmiten&columns%5B1%5D%5Bname%5D&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=NamaEmiten&columns%5B2%5D%5Bname%5D&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=false&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=TanggalPencatatan&columns%5B3%5D%5Bname%5D&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=false&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&start=0&length=700&search%5Bvalue%5D&search%5Bregex%5D=false&_=155082600847')

data = sxow.json() # save the request as .json file

list_of_json = []
for nested_json in data['data']:
    list_of_json.append(requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten='+nested_json['KodeEmiten']).json())
    time.sleep(1)

The list_of_json will contain all the json results you requested for.

Here nested_json is the loop variable to loop through the array of json of different KodeEmiten.

edited Feb 26, 2019 at 5:10

answered Feb 26, 2019 at 4:59

bigbounty

17.5k7 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Nick Over a year ago

I don't really understand how but it works perfectly, thanks a lot! What exactly is each_json though, because it's not defined anywhere?

bigbounty Over a year ago

I have edited the code and put some comments. nested_json is a loop variable. Please accept the answer as it has solved your question

panda Over a year ago

@bigbounty could you please help the following question stackoverflow.com/questions/54865312/…

Nick · Accepted Answer · 2019-03-17 08:42:30Z

1

This is a slight improvement on @bigbounty's approach:
Since the aim is to save the information to a list and then use said list further in the script list comprehension is actually a tad faster.

i.e.

list_of_json = [requests.get('url+nested_json["KodeEmiten"]).json() for nested_json in data["data"]]'

answered Mar 17, 2019 at 8:42

Nick

6871 gold badge8 silver badges20 bronze badges

Collectives™ on Stack Overflow

Python: Saving AJAX response data to .json and save this to pandas DataFrame

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related