0

I just started a python web course and I was trying to parse HTML Data using BeautifulSoup and I came across this error . I researched but couldnt find any precise and certain solution . So here is the piece of code :

   import requests
   from bs4 import BeautifulSoup

   request = requests.get("http://www.johnlewis.com/toms-berkley-slipper-grey/p3061099")
   content = request.content
   soup = BeautifulSoup(content, 'html.parser')
   element = soup.find(" span", {"itemprop ": "price ", "class": "now-price"})
   string_price = (element.text.strip())
   print(int(string_price))


  # <span itemprop="price" class="now-price"> £40.00 </span>

And this is the error I face :

   C:\Users\IngeniousAmbivert\venv\Scripts\python.exe 

   C:/Users/IngeniousAmbivert/PycharmProjects/FullStack/price-eg/src/app.py

    Traceback (most recent call last):
         File "C:/Users/IngeniousAmbivert/PycharmProjects/FullStack/price-eg/src/app.py", line 8, in <module>
             string_price = (element.text.strip())
    AttributeError: 'NoneType' object has no attribute 'text'

 Process finished with exit code 1

Any help will be appreciated

0

2 Answers 2

2

The problem is the extra space characters you have inside the tag name, attribute name and attribute values, replace:

element = soup.find(" span", {"itemprop ": "price ", "class": "now-price"})

with:

element = soup.find("span", {"itemprop": "price", "class": "now-price"})

After that, two more things to fix when converting the string:

  • strip the £ character from the left
  • use float() instead of int()

Fixed version:

element = soup.find("span", {"itemprop": "price", "class": "now-price"})
string_price = (element.get_text(strip=True).lstrip("£"))
print(float(string_price))

You would see 40.00 printed.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks mate . It worked well . But If you could elaborate the code that'd be great . Because as I mentioned I a newbie to python and I couldnt comprehend this statement :string_price = (element.get_text(strip=True).lstrip("£")) . Thanks
@user7338971 absolutely. The .get_text(strip=True) helps to get the text of an element and strip all the extra newlines and whitespaces around the text - normally you would do it via .strip(), but bs4 has this get_text() method which accepts a strip argument - quite handy. After that we left-strip the pound sign. Hope that makes things clearer.
I am really grateful . Thanks for your help . I appreciate it .
0

You can try like this also using css selector:

import requests
from bs4 import BeautifulSoup

request = requests.get("http://www.johnlewis.com/toms-berkley-slipper-grey/p3061099")
content = request.content
soup = BeautifulSoup(content, 'html.parser')
# print soup
element = soup.select("div p.price span.now-price")[0]
print element
string_price = (element.text.strip())
print(int(float(string_price[1:])))

Output:

<span class="now-price" itemprop="price">
                                            £40.00
                                                </span>
40

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.