1

I am trying to download the thumbnails from the digital commons website in order to make a imageJ visualization. Everything prints up until the JSON dump file. I have a code written by my friend to download the image but I need to have a json file of the URLs before I continue. At the end it gives me the error that " Object of type Tag is not JSON serializable".

Sorry for the spaces, I'm new to stack overflow and when I copy and past from Sublime it is messed up.

from bs4 import BeautifulSoup
import requests
import re
import json

all_my_data = []

url = "https://www.digitalcommonwealth.org/search?f%5Bcollection_name_ssim%5D%5B%5D=Produce+Crate+Labels&f%5Binstitution_name_ssim%5D%5B%5D=Boston+Public+Library&per_page=50"
results_page = requests.get(url)
page_html = results_page.text
soup = BeautifulSoup(page_html, "html.parser")

all_labels = soup.find_all("div", attrs = {'class': 'document'})

for items in all_labels:
    my_data = {
    "caption": None,
        "url": None,
    "image url": None,
    }
    item_link = items.find('a') 
abs_url = "https://www.digitalcommonwealth.org/search?f%5Bcollection_name_ssim%5D%5B%5D=Produce+Crate+Labels&f%5Binstitution_name_ssim%5D%5B%5D=Boston+Public+Library&per_page=50" + item_link["href"]
my_data["url"] = abs_url

#print(abs_url)

item_request = requests.get(abs_url)
    item_html = item_request.text
item_soup = BeautifulSoup(item_html, "html.parser")

all_field_divs = item_soup.find_all("div", attrs={'class': 'caption'})

for field in all_field_divs:
    caption = field.find("a")
    cpation = caption.text
    my_data["caption"] = caption
    #print(caption)

all_photo_urls = item_soup.find_all("div", attrs={'class': 'thumbnail'})

for photo_url in all_photo_urls:
    photo = photo_url.find('img')
    photo_abs_url = "https://www.digitalcommonwealth.org/search?f%5Bcollection_name_ssim%5D%5B%5D=Produce+Crate+Labels&f%5Binstitution_name_ssim%5D%5B%5D=Boston+Public+Library&per_page=50" + photo['src']
    my_data['image url'] = photo_abs_url
    #print(photo_abs_url)

all_my_data.append(my_data)

#print(all_my_data)


with open('fruit_crate_labels.json', 'w') as file_object:
    json.dump(all_my_data, file_object, indent=2)
    print('Your file is now ready')

It prints this:

Traceback (most recent call last): File "dh.py", line 54, in json.dump(all_my_data, file_object, indent=2) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/init.py", line 179, in dump for chunk in iterable: File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 429, in _iterencode yield from _iterencode_list(o, _current_indent_level) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 325, in _iterencode_list yield from chunks File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 438, in _iterencode o = _default(o) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type Tag is not JSON serializable

thanks for the help!

1 Answer 1

0

The following code on line 35:

cpation = caption.text

should be:

caption = caption.text

Then your code appears to work as you intended.

Sign up to request clarification or add additional context in comments.

2 Comments

OH haha thanks for catching that. Just wondering...did you look at the results because when I opened the JSON file it only listed one result but 50 times.
Check the code you are running vs your question. When I run the code in your question with the typo fix it only outputs one entry.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.