Data scraping: How to check if a web page contains a specific string

Question

I need to create an "if statement" to check if the string "cette entreprise est membre de la FVE" is part of the a web page.

item_url = "http://www.fveconstruction.ch/anDetails.aspRT=2&M=01&R=1&ID=42105701"
response = requests.get(item_url)
soup = BeautifulSoup(response.text, 'html.parser')
test = soup.findAll(text = re.compile('cette entreprise est membre de la FVE.\w+..\w+'))
print(test)

And it prints me an empty list. Is there someone with an idea? And I would like to know how to check the statement? If I write :

if soup.findAll(text = re.compile('cette entreprise est membre de la FVE.\w+..\w+')): 
     do smth
else:
     do smth

If there isn't the string I'm looking for, it supposed to return false right?

Yes, if findAll returns an empty list, it will be treated as false and the code will go to the else. — Alex Hall
– Alex Hall, Commented May 2, 2016 at 22:17
Before the edit, this post said .findAll() ... It's best to use .find_all() if you're using BS4 because findAll is from bs3 — JasTonAChair
– JasTonAChair, Commented May 2, 2016 at 22:55

Dr. Cool · Accepted Answer · 2016-05-03 17:30:22Z

I checked the HTML of the page you provided in your code and noticed two things. Here is the actual HTML of the text you're trying to find:

<span class="entrepriseDef">Cette entreprise est membre de la FVE&nbsp;&nbsp;</span>

The two problems I see in your code is that you're searching for a lower-case "c", not an upper case "C". Also, you're searching for a period at the end of the text which isn't there. When you're screen-scraping a website, view the HTML of that page (type Control+U in your browser to see the HTML) and search for the exact text. Then copy/paste that text into your code so it is precise.

Your code should be like this:

item_url = "http://www.fveconstruction.ch/anDetails.asp?RT=2&M=01&R=1&ID=42105701"
response = requests.get(item_url)
soup = BeautifulSoup(response.text, 'html.parser')
test = soup.findAll(text = re.compile('Cette entreprise est membre de la FVE\w+..\w+'))
print(test)

Joe T. Boka · Accepted Answer · 2016-05-02 22:53:49Z

2

I have no way of knowing if your regex works or not, as your regex is not part of your post.

This answer is to show you how to check if the "webpage contains a specific string", without the regex issue.

import requests
r = requests.get('http://www.fveconstruction.ch/anDetails.asp?
RT=2&M=01&R=1&ID=42105701')

if 'cette entreprise est membre de la FVE.' in r.text:
    print ('Yes')
else:
    print ('No')

edited May 2, 2016 at 22:53

answered May 2, 2016 at 22:32

Joe T. Boka

6,5896 gold badges33 silver badges49 bronze badges

2 Comments

jjyoh Over a year ago

Thanks for the answer but why It returns me "no" and it should return me "yes". Is there something special on the site ? If you look at the url, just below the bold title "A.GUIDO & FILS SA", it's written "Cette entreprise est membre de la FVE" ?

Joe T. Boka Over a year ago

You have cette in your code and in your comment you wrote Cette with capital C. Also, there is a . period in your code that's not in your comment. Make sure that the string in your if statement is exactly the same as in the text.

Collectives™ on Stack Overflow

Data scraping: How to check if a web page contains a specific string

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related