Stop a python for loop when conditions are met at different times

Question

The following is a simple code I wrote in python to scrape specific information from numerically ascending URLs. It works great, and I can see the results in python IDLE.

import requests
from urllib import request, response, error, parse
from urllib.request import urlopen
from bs4 import BeautifulSoup

for i in range(35, 345, 1):
    url = 'https://www.example.com/ID=' + str(i)
    html = urlopen(url)
    soup = BeautifulSoup(html, "html.parser")
    information1=soup.find(text='sam')
    information2=soup.find(text='john')
    print(information1,information2,i)

so the results look like this:

None None 35
None None 36
None sam 37
john None 38
None None 39
....
None None 345

Now this is great and is what I need, but I would like to improve my code by having the execution stop at "john None 38" when everything I need is found. So there won't be the unnecessary additional 300 plus lines.

Now there are two things you should know. First, information1 and information2 will never be in the same webpage. They will always be on separate URLs. Second, information1 appeared first before information2 in the above code, but the reverse is also possible if I changed the string to something else I'm looking for.

So the solution needs to incorporate the fact that information1 and information2 will appear in the results at different rows, and that information1 could appear first or second and vice versa.

I'm really struggling to form "if" code with the above mentioned conditions. I'd appreciate any help. Thank you.

Iain Shelvington · Accepted Answer · 2019-12-10 02:28:01Z

1

# Default to None
information1 = None
information2 = None
for i in range(35, 345, 1):
    ...
    # If already set don't override
    information1 = information1 or soup.find(text='sam')
    # Same here
    information2 = information2 or soup.find(text='john')
    if information1 and information2:
        # We have both information1 and information2 so break out of the for loop
        break

answered Dec 10, 2019 at 2:28

Iain Shelvington

32.5k3 gold badges36 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Daniel Skovli · Accepted Answer · 2019-12-10 02:27:53Z

You can store your trackers outside the loop, and have them remain between iterations:

import requests
from urllib import request, response, error, parse
from urllib.request import urlopen
from bs4 import BeautifulSoup

info1 = None
info2 = None

for i in range(35, 345, 1):
    url = 'https://www.example.com/ID=' + str(i)
    html = urlopen(url)
    soup = BeautifulSoup(html, "html.parser")
    information1=soup.find(text='sam')
    information2=soup.find(text='john')

    if information1 is not None and info1 is None:
        info1 = information1

    if information2 is not None and info2 is None:
        info2 = information2

    if info1 and info2:
        break

print('Information 1: {}'.format(info1))
print('Information 2: {}'.format(info2))

Collectives™ on Stack Overflow

Stop a python for loop when conditions are met at different times

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related