json error dump during web scraping in python

Question

I am trying to download the thumbnails from the digital commons website in order to make a imageJ visualization. Everything prints up until the JSON dump file. I have a code written by my friend to download the image but I need to have a json file of the URLs before I continue. At the end it gives me the error that " Object of type Tag is not JSON serializable".

Sorry for the spaces, I'm new to stack overflow and when I copy and past from Sublime it is messed up.

from bs4 import BeautifulSoup
import requests
import re
import json

all_my_data = []

url = "https://www.digitalcommonwealth.org/search?f%5Bcollection_name_ssim%5D%5B%5D=Produce+Crate+Labels&f%5Binstitution_name_ssim%5D%5B%5D=Boston+Public+Library&per_page=50"
results_page = requests.get(url)
page_html = results_page.text
soup = BeautifulSoup(page_html, "html.parser")

all_labels = soup.find_all("div", attrs = {'class': 'document'})

for items in all_labels:
    my_data = {
    "caption": None,
        "url": None,
    "image url": None,
    }
    item_link = items.find('a') 
abs_url = "https://www.digitalcommonwealth.org/search?f%5Bcollection_name_ssim%5D%5B%5D=Produce+Crate+Labels&f%5Binstitution_name_ssim%5D%5B%5D=Boston+Public+Library&per_page=50" + item_link["href"]
my_data["url"] = abs_url

#print(abs_url)

item_request = requests.get(abs_url)
    item_html = item_request.text
item_soup = BeautifulSoup(item_html, "html.parser")

all_field_divs = item_soup.find_all("div", attrs={'class': 'caption'})

for field in all_field_divs:
    caption = field.find("a")
    cpation = caption.text
    my_data["caption"] = caption
    #print(caption)

all_photo_urls = item_soup.find_all("div", attrs={'class': 'thumbnail'})

for photo_url in all_photo_urls:
    photo = photo_url.find('img')
    photo_abs_url = "https://www.digitalcommonwealth.org/search?f%5Bcollection_name_ssim%5D%5B%5D=Produce+Crate+Labels&f%5Binstitution_name_ssim%5D%5B%5D=Boston+Public+Library&per_page=50" + photo['src']
    my_data['image url'] = photo_abs_url
    #print(photo_abs_url)

all_my_data.append(my_data)

#print(all_my_data)


with open('fruit_crate_labels.json', 'w') as file_object:
    json.dump(all_my_data, file_object, indent=2)
    print('Your file is now ready')

It prints this:

Traceback (most recent call last): File "dh.py", line 54, in json.dump(all_my_data, file_object, indent=2) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/init.py", line 179, in dump for chunk in iterable: File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 429, in _iterencode yield from _iterencode_list(o, _current_indent_level) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 325, in _iterencode_list yield from chunks File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 438, in _iterencode o = _default(o) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type Tag is not JSON serializable

thanks for the help!

Error - Syntactical Remorse · Accepted Answer · 2019-04-30 20:19:26Z

0

The following code on line 35:

cpation = caption.text

should be:

caption = caption.text

Then your code appears to work as you intended.

edited Apr 30, 2019 at 20:19

answered Apr 30, 2019 at 20:12

Error - Syntactical Remorse

7,9454 gold badges29 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ceagle Over a year ago

OH haha thanks for catching that. Just wondering...did you look at the results because when I opened the JSON file it only listed one result but 50 times.

Error - Syntactical Remorse Over a year ago

Check the code you are running vs your question. When I run the code in your question with the typo fix it only outputs one entry.

Collectives™ on Stack Overflow

json error dump during web scraping in python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related