I'm trying to make multiple API calls to retrieve JSON files. The JSONs all follow the same schema. I want to merge all the JSON files together as one file so I can do two things:
1) Extract all the IP addresses from the JSON to work with later 2) Convert the JSON into a Pandas Dataframe
When I first wrote the code, I made a single request and it returned a JSON that I could work with. Now I have used a for loop to collect multiple JSONs and append them to a list called results_list so that the next JSON does not overwrite the previous one I requested.
Here's the code
headers = {
'Accept': 'application/json',
'key': 'MY_API_KEY'
}
query_type = 'QUERY_TYPE'
locations_list = ['London', 'Amsterdam', 'Berlin']
results_list = []
for location in locations_list:
url = ('https://API_URL' )
r = requests.get(url, params={'query':str(query_type)+str(location)}, headers = headers)
results_list.append(r)
with open('my_search_results.json' ,'w') as outfile:
json.dump(results_list, outfile)
The JSON file my_search_results.json has a separate row for each API query e.g. 0 is London, 1 is Amsterdam, 2 is Berlin etc. Like this:
[
{
"complete": true,
"count": 51,
"data": [
{
"actor": "unknown",
"classification": "malicious",
"cve": [],
"first_seen": "2020-03-11",
"ip": "1.2.3.4",
"last_seen": "2020-03-28",
"metadata": {
"asn": "xxxxx",
"category": "isp",
"city": "London",
"country": "United Kingdom",
"country_code": "GB",
"organization": "British Telecommunications PLC",
"os": "Linux 2.2-3.x",
"rdns": "xxxx",
"tor": false
},
"raw_data": {
"ja3": [],
"scan": [
{
"port": 23,
"protocol": "TCP"
},
{
"port": 81,
"protocol": "TCP"
}
],
"web": {}
},
"seen": true,
"spoofable": false,
"tags": [
"some tag",
]
}
(I've redacted any sensitive data. There is a separate row in the JSON for each API request, representing each city, but it's too big to show here)
Now I want to go through the JSON and pick out all the IP addresses:
for d in results_list['data']:
ips = (d['ip'])
print(ips)
However this gives the error:
TypeError: list indices must be integers or slices, not str
When I was working with a single JSON from a single API request this worked fine, but now it seems like either the JSON is not formatted properly or Python is seeing my big JSON as a list and not a dictionary, even though I used json.dump() on results_list earlier in the script. I'm sure it has to do with the way I had to take all the API calls and append them to a list but I can't work out where I'm going wrong.
I'm struggling to figure out how to pick out the IP addresses or if there is just a better way to collect and merge multiple JSONs. Any advice appreciated.