1

I've scraped a webpage using an API and want to access one part of the results, but am having difficulty accessing it:

import requests
import json
headers = {'x-api-key': 'my_key'}
test_url= "https://api.propublica.org/congress/v1/statements/date/2018-05- 
22.json"
resp = requests.get(test_url, headers=headers).json()

The results are coming out in a dictionary format that looks like this:

[{'chamber': 'Senate',
          'congress': 115,
          'party': 'R',
          'state': 'NC',
          'url': 'url1_goes_here'},
{'chamber': 'Senate',
          'congress': 115,
          'party': 'R',
          'state': 'ND',
          'url': 'url2_goes_here'}]

I want to extract the 'url' value from each entry, but the entries doesn't seem to have a key by which I can drill down. How can I go about accessing these? I thought that:

resp["url]

Would work, but I didn't have any luck. The output I'd ideally want would be something like:

[url1, url2]
1
  • After trying out your code I figured out what's propably your problem. your request returns {'message': None} for me so there will obviously an error when you try to get url which isn't there. The site also shows you need an api key so if you haven't one you need to sign up to get your actual data from the api Commented May 24, 2018 at 15:44

3 Answers 3

2

You need to extract each URL in turn out of your resp list. A simple list-comprehension would do:

urls = [entry['url'] for entry in resp]
Sign up to request clarification or add additional context in comments.

2 Comments

This makes sense, but is throwing me an error "TypeError: string indices must be integers"? @mathias
@sabrina if resp is indeed what you posted in the question, this shouldn't. Are you sure resp is a list of dictionaries?
2

What you have is a list of dicts. So you have to first get the elements of that list, before you can treat them as dicts.

For example, the first URL is results[0]['url']. Or, if you want to do something with every URL, you have to do 'for result in results: dosomething(result['url'])`.

So, what if you want to get a list of all the URLs?

urls = []
for result in results:
    url = result['url']
    results.append(url)

Of course you can make this more compact if you understand list comprehensions:

urls = [result['url'] for result in results]

If you're going to be doing a lot of complicated lookups on this structure, there are two options to consider.


First, you can restructure the data into a form that's easier to use. For example, if you're going to need to do a lot of things like look up the senator from North Dakota, it would be nice if you could write senate['ND'] instead of [result for result in results if result['state'] == 'ND' and result['chamber'] == 'Senate']. You can do that with:

senate = {result['state']: result for result in results if result['chamber'] == 'Senate'}
house = {result['state']: result for result in results if result['chamber'] == 'House'}

Obviously this is complicated, and it's only useful if it saves you more complexity elsewhere, multiple times.


There are also (at least) three different mini-languages for searching nested-list-and-dict structures with key paths as strings, with wildcards—jsonpath, dpath, and kvc—and they all have libraries on PyPI you can look for. They all have a bit of a learning curve, and are overkill if you're just doing one simple search on the data. But if you're going to be doing a lot of searches, the fact that you can write each one as, say, urls = search(results, '*.url') instead of urls = [result['url'] for result in results] can sometimes pay off.

1 Comment

Similar to above, when I try to call: urls = [resp['url'] for resp in resp] I get thrown "TypeError: string indices must be integers" ? @abarnert
-1

Was able to get this working with:

for each in resp['results']:
print(each['url'])

1 Comment

Had you indicated that the list in your question was under the 'result' key, we could have included it in our answers. For now resp['result'] is pure magic given the current state of the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.