1

I want to extract the full address from the webpage and I'm using BeautifulSoup and JSON. Here's my code:

import bs4
import json
from bs4 import BeautifulSoup
import requests

url = 'xxxxxxxxxxxxxxxxx'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')

for i in soup.find_all('div', attrs={'data-integration-name':'redux-container'}):
    info = json.loads(i.get('data-payload'))

I printed 'info' out:

{'storeName': None, 'props': {'locations': [{'dirty': False, 'updated_at': '2016-05-05T07:57:19.282Z', 'country_code': 'US', 'company_id': 106906, 'longitude': -74.0001954, 'address': '5 Crosby St  3rd Floor', 'state': 'New York', 'full_address': '5 Crosby St  3rd Floor, New York, 10013, New York, USA', 'country': 'United States', 'id': 17305, 'to_params': 'new-york-us', 'latitude': 40.719753, 'region': '', 'city': 'New York', 'description': '', 'created_at': '2015-01-19T01:32:16.317Z', 'zip_code': '10013', 'hq': True}]}, 'name': 'LocationsMapList'}

What I want is the "full_address" under "location" so my code was:

info = json.loads(i.get('data-payload'))
for i in info['props']['locations']:
        print (i['full_address'])

But I got this error:

----> 5     for i in info['props']['locations']:

KeyError: 'locations'

I want to print the full address out, which is '5 Crosby St 3rd Floor, New York, 10013, New York, USA'.

Thanks a lot!

1
  • 1
    while iterating your second info doesn't have locations key in 'props' value Commented Sep 16, 2017 at 23:49

2 Answers 2

2

The data you are parsing seem to be inconsistent, the keys are not in all objects.

If you still want to perform a loop, you need to use a try/except statement to catch an exception, or the method get to set a fallback when you're looking for a key in a dictionary that could be not here.

info = json.loads(i.get('data-payload'))
for item in info['props'].get('locations', []):
    print (item.get('full_address', 'no address'))

get('locations', []) : returns an empty list if the key location doesn't exist, so the loop doesn't run any iteration.

get('full_address', 'no address') : returns "no adress" in case there is no such key


EDIT :

The data are inconsistent (never trust data). Some JSON objects have a key props with a null /None value. The next fix should correct that :

info = json.loads(i.get('data-payload'))
if info.get('props'):
    for item in info['props'].get('locations', []):
        print (item.get('full_address', 'no address'))
Sign up to request clarification or add additional context in comments.

1 Comment

It got another error" ----> 5 for item in info['props'].get('locations', []): AttributeError: 'NoneType' object has no attribute 'get'
1

Your first object is fine, but it's clear that your second object has no locations key anywhere, nor full_address.

3 Comments

'props': {'locations': [{'dirty': False, 'updated_at': '2016-05-05T07:57:19.282Z', 'country_code': 'US', 'company_id': 106906, 'longitude': -74.0001954, 'address': '5 Crosby St 3rd Floor', 'state': 'New York', 'full_address': '5 Crosby St 3rd Floor, New York, 10013, New York, USA', 'country': 'United States', 'id': 17305, 'to_params': 'new-york-us', 'latitude': 40.719753, 'region': '', 'city': 'New York', 'description': '', 'created_at': '2015-01-19T01:32:16.317Z', 'zip_code': '10013', 'hq': True}]}, 'name': 'LocationsMapList'}
@Laura that's not the second sample object you posted. You can simply Ctrl+F for locations on this page to see that it's in the first object, but not the second.
Yes, I only pasted several lines from the output. The only part I want is the content of the full address under locations in the first object.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.