1

I have a bunch of URLs that I want to scrape. Some links have values that don't exist in others. I wanted to know how can I avoid getting an error if there is no value in any URL?

I tried the try and except but it didn't work

In this link, you will see 4 values under examInformation. I wrote the expression for 5 values and it gives me an error. I want it to just skip if the value doesn't exist.

Here is my code:

    try:
        location_1 = schools['examInformation'][0]['cbrLocationShortName']
    except:
        pass
    try:
        location_2 = schools['examInformation'][1]['cbrLocationShortName']
    except:
        pass
    try:
        location_3 = schools['examInformation'][2]['cbrLocationShortName']
    except:
        pass
    try:
        location_4 = schools['examInformation'][3]['cbrLocationShortName']
    except:
        pass
    try:
        location_5 = schools['examInformation'][4]['cbrLocationShortName']
    except:
        pass

   yield {
       "Location 1": location_1 if location_1 else "N/A",
       "Location 2": location_2 if location_2 else "N/A",
       "Location 3": location_3 if location_3 else "N/A",
       "Location 4": location_4 if location_4 else "N/A",
       "Location 5": location_5 if location_5 else "N/A",    
   }

I am getting the following error:

UnboundLocalError: local variable 'location_5' referenced before assignment

NOTE: I am using scrapy with JSON library

1
  • 1
    CodeWithAwais I deleted my answer, as I was too quick to assume what the error was. I suggest you go with @Madjazz loop solution, it's a better design than the repetition. Commented Oct 20, 2020 at 22:00

1 Answer 1

2

The easiest fix would be to assign a None or your final 'N/A' value to each variable in the except block, i.e:

 try:
     location_5 = schools['examInformation'][4]['cbrLocationShortName']
 except:
     location_5 = 'N/A'

    yield {
           "Location 5": location_5,   
    }

If you want to avoid all the code duplication and exception handling in your example, I would pack it into a loop with a safe get method:

locations = {}
exam_information = schools['examInformation']

for i in range(len(exam_information)):
    location_key = f'Location {i + 1}'
    locations[location_key] = exam_information[i].get('cbrLocationShortName', 'N/A')

yield locations
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.