0

I need to create an "if statement" to check if the string "cette entreprise est membre de la FVE" is part of the a web page.

item_url = "http://www.fveconstruction.ch/anDetails.aspRT=2&M=01&R=1&ID=42105701"
response = requests.get(item_url)
soup = BeautifulSoup(response.text, 'html.parser')
test = soup.findAll(text = re.compile('cette entreprise est membre de la FVE.\w+..\w+'))
print(test)

And it prints me an empty list. Is there someone with an idea? And I would like to know how to check the statement? If I write :

if soup.findAll(text = re.compile('cette entreprise est membre de la FVE.\w+..\w+')): 
     do smth
else:
     do smth

If there isn't the string I'm looking for, it supposed to return false right?

3
  • 2
    Why can't you just use 'text' in response.text? Commented May 2, 2016 at 22:06
  • Yes, if findAll returns an empty list, it will be treated as false and the code will go to the else. Commented May 2, 2016 at 22:17
  • Before the edit, this post said .findAll() ... It's best to use .find_all() if you're using BS4 because findAll is from bs3 Commented May 2, 2016 at 22:55

2 Answers 2

2

I checked the HTML of the page you provided in your code and noticed two things. Here is the actual HTML of the text you're trying to find:

<span class="entrepriseDef">Cette entreprise est membre de la FVE&nbsp;&nbsp;</span>

The two problems I see in your code is that you're searching for a lower-case "c", not an upper case "C". Also, you're searching for a period at the end of the text which isn't there. When you're screen-scraping a website, view the HTML of that page (type Control+U in your browser to see the HTML) and search for the exact text. Then copy/paste that text into your code so it is precise.

Your code should be like this:

item_url = "http://www.fveconstruction.ch/anDetails.asp?RT=2&M=01&R=1&ID=42105701"
response = requests.get(item_url)
soup = BeautifulSoup(response.text, 'html.parser')
test = soup.findAll(text = re.compile('Cette entreprise est membre de la FVE\w+..\w+'))
print(test)
Sign up to request clarification or add additional context in comments.

Comments

2

I have no way of knowing if your regex works or not, as your regex is not part of your post.

This answer is to show you how to check if the "webpage contains a specific string", without the regex issue.

import requests
r = requests.get('http://www.fveconstruction.ch/anDetails.asp?
RT=2&M=01&R=1&ID=42105701')

if 'cette entreprise est membre de la FVE.' in r.text:
    print ('Yes')
else:
    print ('No')

2 Comments

Thanks for the answer but why It returns me "no" and it should return me "yes". Is there something special on the site ? If you look at the url, just below the bold title "A.GUIDO & FILS SA", it's written "Cette entreprise est membre de la FVE" ?
You have cette in your code and in your comment you wrote Cette with capital C. Also, there is a . period in your code that's not in your comment. Make sure that the string in your if statement is exactly the same as in the text.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.