Hello and thank your for taking the time to have a read at this,
I am looking to extract company information from a particular stock exchange and then save this information to a pandas DataFrame.
Each firm has it's own webpage that are all determined by the "KodeEmiten" ending. These codes are saved in a column of the first Dataframe:
df = pd.DataFrame.from_dict(data['data'])
Now my goal is to use these codes to call each companies website individually and create a json file for each
for i in range (len(df)):
requests.get(f'https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten={df.loc[i, "KodeEmiten"]}').json()
While this works i can't save this to a new DataFrame due list index out of range and incorrect keyword errors. There is significantly more information in the xhr than i actually need and the different structures are what I believe to cause the error trying to save them to a new DataFrame. I'm really just interested in getting the data in these xhr headers:
AnakPerusahaan:, Direktur:, Komisaris, PemegangSaham:
So my question is kind of two-in-one:
a) How can I just extract the information from those specific xhr headers (all of them are tables)
b) how can i save those to a new dataframe (or even list I don't really mind)
import requests
import pandas as pd
import json
import time
# gets broad data of main page of the stock exchange
sxow = requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfiles?draw=1&columns%5B0%5D%5Bdata%5D=KodeEmiten&columns%5B0%5D%5Bname%5D&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=KodeEmiten&columns%5B1%5D%5Bname%5D&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=NamaEmiten&columns%5B2%5D%5Bname%5D&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=false&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=TanggalPencatatan&columns%5B3%5D%5Bname%5D&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=false&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&start=0&length=700&search%5Bvalue%5D&search%5Bregex%5D=false&_=155082600847')
data = sxow.json() # save the request as .json file
df = pd.DataFrame.from_dict(data['data']) #creates DataFrame based on the data (.json) file
# add: compare file contents and overwrite original if same
cdate = time.strftime ("%Y%m%d") # creating string-variable w/ current date year|month|day
df.to_excel(f"{cdate}StockExchange_Overview.xlsx") # converts DataFrame to Excel file, can't overwrite existing file
for i in range (len(df)) :
requests.get(f'https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten={df.loc[i, "KodeEmiten"]}').json()
#This is where I'm completely stuck