I have:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url='https://www.zoopla.co.uk/for-sale/property/london/west-wickham/?q=West%20Wickham%2C%20London&results_sort=newest_listings&search_source=home'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,'html.parser')
containers = page_soup.findAll("div",{"class":"listing-results-wrapper"})
listing_price = []
listing_nobed = []
for c in containers:
listing_price.append(c.findAll("a",{"class":"listing-results-price text-price"}))
listing_nobed.append(c.findAll("h3",{"class":"listing-results-attr"}))
print(listing_price[0])
print('----------------------------')
print(listing_nobed[0])
results:
[<a class="listing-results-price text-price" href="/for-sale/details/50924268">
£500,000
<span class="price-modifier">Offers over</span>
</a>]
----------------------------
[<h3 class="listing-results-attr">
<span class="num-icon num-beds" title="3 bedrooms"><span class="interface"></span>3</span> <span class="num-icon num-baths" title="1 bathroom"><span class="interface"></span>1</span> <span class="num-icon num-reception" title="2 reception rooms"><span class="interface"></span>2</span>
</h3>]
I want:
Price NoBeds NoBaths NoRec
500,000 3 1 2
xxx x x NaN
Where xxx is the price, etc. Some of the values do not have a tag, so if that is the case, then show NaN or 0
I tried Python - Beautiful Soup - Remove Tags to to extract the (3,1,2) values to no avail.
To extract the price, I thought of using regex, but found many comments here do not recommend it.
I am still trying to understand html tags and data extractions, so any suggestions are greatly appreciated.