12

I'm trying to scrape some information from webpages that are inconsistent about where the info is located. I've got code to handle each of several possibilities; what I want is to try them in sequence, then if none of them work I'd like to fail gracefully and move on.

That is, in psuedo-code:

try:
    info = look_in_first_place()
otherwise try:
    info = look in_second_place()
otherwise try:
    info = look_in_third_place()
except AttributeError:
    info = "Info not found"

I could do this with nested try statements, but if I need 15 possibilities to try then I'll need 15 levels of indentation!

This seems like a trivial enough question that I feel like I'm missing something, but I've searched it into the ground and can't find anything that looks equivalent to this situation. Is there a sensible and Pythonic way to do this?

EDIT: As John's (pretty good) solution below raises, for brevity I've written each lookup above as a single function call, whereas in reality it's usually a small block of BeautifulSoup calls such as soup.find('h1', class_='parselikeHeader'). Of course I could wrap these in functions, but it seems a bit inelegant with such simple blocks -- apologies if my shorthand changes the problem though.

This may be a more useful illustration:

try:
    info = soup.find('h1', class_='parselikeHeader').get('href')
if that fails try:
    marker = soup.find('span', class_='header')
    info = '_'.join(marker.stripped_strings)
if that fails try:
    (other options)
except AttributeError:
    info = "Info not found"
5
  • why do you need a try/except, use if elif else Commented Jul 8, 2014 at 11:37
  • Thanks Padraic, but in addition to an instinct to favor EAFP over LYBL, I'm not sure I'd be able to predict the if conditions to check for as it could go wrong in many different ways, so handling a bounded but wide range of exceptions seemed like a natural fit. Commented Jul 8, 2014 at 12:40
  • find will be empty if it does not match anything so if find... will only be True if there is a match so if elif else would work .you could put all patterns in a list and loop over it using an if check and an else if none match Commented Jul 8, 2014 at 12:55
  • I was starting to like this option, but then I thought of another problem if I need three lines to find the info (e.g., find the 'a class=123' tag, then find the last div before that, then find the text within that div). Any of those could go wrong, so I think I'd literally need to have an if condition check every single line of code I use (even if I do wrap each section in functions)! This seems like the whole point of EAFP -- I can run all the code within a try statement and not care where it goes wrong, just log it and move on. Unless I'm missing something? Commented Jul 8, 2014 at 16:45
  • you could probably use if all() for multiple conditions, you should post an example of the functions you are using. Commented Jul 8, 2014 at 16:48

1 Answer 1

9

If each lookup is a separate function, you can store all the functions in a list and then iterate over them one by one.

lookups = [
    look_in_first_place,
    look_in_second_place,
    look_in_third_place
]

info = None

for lookup in lookups:
    try:
        info = lookup()
        # exit the loop on success
        break    
    except AttributeError:
        # repeat the loop on failure
        continue

# when the loop is finished, check if we found a result or not
if info:
    # success
else:
    # failure
Sign up to request clarification or add additional context in comments.

2 Comments

I do like that, but at the moment the lookup code is not actually in separate functions but is usually 2-3 lines of BeautifulSoup calls. I suppose I could write wrappers for all the possibilities, but this seems like overkill since I am checking for several bits of info, each of which could have several lookups to try... The list strategy does seem very Pythonic though so I may use this if I there isn't a better solution.
Also having thought about it, doing it this way would mean that the later functions might only ever be useful if called in the particular order that the list specifies (if, for example, I want to make each successive try more permissive). Having functions lying around that produce bad data unless they're used right after another function seems like it could be a dangerous encapsulation strategy?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.