1

I would like to read all the details about the ads on page

https://www.nepremicnine.net/oglasi-prodaja/gorenjska/kranj/kranj/stanovanje/letnik-od-1980-do-1989/

When reading the element "//div[@class='seznam']" I get basic information about all the ads. But I would like to read all the elements <meta itemprop="position" content="1"> or the element <meta itemprop="price" content="229000.00">, but I always get data of the first ad on the page. I tried with scrolling on page , I also use the function find_elements instead of find_element.

from selenium.common.exceptions import TimeoutException
from selenium.webdriver import ActionChains, Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support import expected_conditions as EC
import time
from seleniumbase import Driver
from selenium.webdriver.remote.webelement import WebElement

driver = Driver(uc=True)
wait = WebDriverWait(driver, 60)
URL = "https://www.nepremicnine.net/oglasi-prodaja/gorenjska/kranj/kranj/stanovanje/letnik-od-1980-do-1989/"
driver.open(URL)

wait.until(EC.element_to_be_clickable((By.ID,"CybotCookiebotDialogBodyButtonDecline"))).click()
cnt_oglasov = wait.until(EC.element_to_be_clickable((By.XPATH, "//div[@class='oglasi_cnt']")))
print(cnt_oglasov.text)

if (cnt_oglasov.text.find('Št. ustreznih oglasov: '))>=0:
    st_oglasov_str=cnt_oglasov.text.removeprefix("Št. ustreznih oglasov: ")
    print("st. najdenih oglasov: " + st_oglasov_str)

#list of ads
driver.execute_script("window.scrollTo(0, 1000);")
time.sleep(5)
vsebina = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[@class='seznam']")))
#print(vsebina.text)
#ActionChains(driver).move_to_element(vsebina)
posamezni = vsebina.find_elements(By.CLASS_NAME, 'property-details')
title = []
for one in posamezni:
    title = one.find_element(By.XPATH, "//div[@class='propertydetails']")
    title1 = title.get_attribute('data-href')
    print(title1)

    time.sleep(5)
3
  • I see find_element (without s) inside for-loop - it may need . for relative xpath ".//div[@class='propertydetails']". Without . it can always give the same (first) element. Commented Jun 4 at 9:15
  • I get NoSuchElementException: Commented Jun 4 at 9:30
  • Don't use sleep() in this context. If you use WebDriverWait properly it's just not necessary. In any case it's likely to be unreliable Commented Jun 4 at 9:45

4 Answers 4

0

Looks like all you're interested in is the data-href attributes in which case:

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.remote.webdriver import WebDriver
from collections.abc import Iterator

URL = "https://www.nepremicnine.net/oglasi-prodaja/gorenjska/kranj/kranj/stanovanje/letnik-od-1980-do-1989/"


def decline_cookies(driver: WebDriver) -> None:
    wait = WebDriverWait(driver, 5)
    ec = EC.element_to_be_clickable
    locator = (By.ID, "CybotCookiebotDialogBodyButtonDecline")
    try:
        button = wait.until(ec(locator))
        button.click()
    except Exception as e:
        pass


def ads(driver: WebDriver) -> Iterator[str]:
    wait = WebDriverWait(driver, 10)
    ec = EC.presence_of_all_elements_located
    locator = (By.CSS_SELECTOR, "div.col-md-6 div.property-details")
    for div in wait.until(ec(locator)):
        result = div.get_attribute("data-href")
        if isinstance(result, str):
            yield result

if __name__ == "__main__":
    with Chrome() as driver:
        driver.get(URL)
        decline_cookies(driver)
        for href in ads(driver):
            print(href)

Output:

https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6980815/
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6993792/
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6849482/
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6895075/
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6965426/
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6836639/
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6988403/
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6817465/
https://www.nepremicnine.net/oglasi-prodaja/kranj-planina-stanovanje_6987918/
https://www.nepremicnine.net/oglasi-prodaja/kranj-planina-stanovanje_6989373/
https://www.nepremicnine.net/oglasi-prodaja/kranj-planina-1-stanovanje_6985299/
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for this example. I have to learn a lot :)
0

Check the simplified working code below with explanation in comments.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Chrome()
driver.get("https://www.nepremicnine.net/oglasi-prodaja/gorenjska/kranj/kranj/stanovanje/letnik-od-1980-do-1989/")
driver.maximize_window()
wait = WebDriverWait(driver, 10)

# accept cookies
wait.until(EC.element_to_be_clickable((By.ID, "CybotCookiebotDialogBodyButtonAccept"))).click()
# wait for the ads to load and then store it into a variable
all_Ads = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='property-details']")))

# Loop through each ad and print it's text
for ad in all_Ads:
    print(ad.text)
    print(ad.get_attribute("data-href"))
    print("---------------------")

Result:

Prodaja: Stanovanje, 1-sobno
KRANJ
47 m2, 1-sobno, zgrajeno l. 1986, adaptirano l. 2021, 7. nad., prodamo. Cena: 229.000,00 EUR
47,00 m2 1986 7
229.000,00 €
Fesst nepremičnine d.o.o.
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6980815/
---------------------
Prodaja: Stanovanje, 1,5-sobno
NOVO
KRANJ
47,4 m2, 1,5-sobno, zgrajeno l. 1989, 1/5 nad., prodamo. Cena: 198.000,00 EUR
47,40 m2 1989 1/5
198.000,00 €
BAZA agencija d.o.o.
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6993792/
---------------------
Prodaja: Stanovanje, 2-sobno
KRANJ
72,3 m2, 2-sobno, zgrajeno l. 1982, adaptirano l. 2009, 4/7 nad., prodamo. Cena: 252.000,00 EUR
72,30 m2 1982 4/7
252.000,00 €
Nepremičnina posredovanje d.o.o.
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6849482/
---------------------
Prodaja: Stanovanje, 3-sobno
KRANJ
73,9 m2, 3-sobno, zgrajeno l. 1989, 4/4 nad., prodamo. Cena: 230.000,00 EUR
73,90 m2 1989 4/4
230.000,00 €
Fesst nepremičnine d.o.o.
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6895075/
---------------------
Prodaja: Stanovanje, 3-sobno
KRANJ
77 m2, 3-sobno, zgrajeno l. 1989, 1/5 nad., prodamo. Cena: 259.000,00 EUR
77,00 m2 1989 1/5
259.000,00 €
BAZA agencija d.o.o.
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6965426/
---------------------
Prodaja: Stanovanje, 3,5-sobno
KRANJ
77 m2, 3,5-sobno, zgrajeno l. 1989, adaptirano l. 2023, 1/5 nad., prodamo. Cena: 259.000,00 EUR
77,00 m2 1989 1/5
259.000,00 €
Zasebna ponudba
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6836639/
---------------------
Prodaja: Stanovanje, 3-sobno
KRANJ
106 m2, 3-sobno, zgrajeno l. 1983, adaptirano l. 2013, 2. nad., prodamo. Cena: 322.000,00 EUR
106,00 m2 1983 2
322.000,00 €
Fesst nepremičnine d.o.o.
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6988403/
---------------------
Prodaja: Stanovanje, 3-sobno
KRANJ
118 m2, 3-sobno, zgrajeno l. 1980, adaptirano l. 2018, 2/2 nad., prodamo. Cena: 310.000,00 EUR
118,00 m2 1980 2/2
310.000,00 €
Aeon nepremičnine, d.o.o., PE Izola
https://www.nepremicnine.net/oglasi-prodaja/kranj-stanovanje_6817465/
---------------------
Prodaja: Stanovanje, 3-sobno
KRANJ, PLANINA
63 m2, 3-sobno, zgrajeno l. 1987, adaptirano l. 2022, P/4 nad., klet 8 m2, stanovanje predelano v 3-sobno. Nahaja se v p...
63,00 m2 1987 P/4
249.000,00 €
Zasebna ponudba
https://www.nepremicnine.net/oglasi-prodaja/kranj-planina-stanovanje_6987918/
---------------------
Prodaja: Stanovanje, 3-sobno
KRANJ, PLANINA
106 m2, 3-sobno, zgrajeno l. 1983, adaptirano l. 2013, 2/3 nad., prodamo. Cena: 322.000,00 EUR
106,00 m2 1983 2/3
322.000,00 €
Zasebna ponudba
https://www.nepremicnine.net/oglasi-prodaja/kranj-planina-stanovanje_6989373/
---------------------
Prodaja: Stanovanje, 1,5-sobno
KRANJ, PLANINA 1
53 m2, 1,5-sobno, zgrajeno l. 1980, adaptirano l. 2015, 2/4 nad., Prodam stanovanje Planina 1, prodamo. Cena: cca 220.00...
53,00 m2 1980 2/4
220.000,00 €
Zasebna ponudba
https://www.nepremicnine.net/oglasi-prodaja/kranj-planina-1-stanovanje_6985299/
---------------------

Process finished with exit code 0

UPDATE: Answer to second question asked in the comments.

# Extracting meta content for position
metaContent = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='col-md-6 col-md-12 position-relative']//meta[@itemprop='position']")))
for content in metaContent:
    print(content.get_attribute("content"))

Result:

1
2
3
4
5
6
7
8
9
10
11

Process finished with exit code 0

5 Comments

Thanks a lot. How can i try to get element's "//meta[@itemprop='position']" atribute 'content'?
<meta> - These tags usually provide information and they are not displayed on the webpage. Which exact element's attribute do you want to retrieve? Can you point it out on the web page?
Ok check the updated answer. I hope this is what you are after.
Again thanks a lot. I didn't know how to nested elements are defined, either witch function must to use that check if element is present.
0

In your for loop the selector you are using, returns the divs with property-details class. You have used the same selector to populate the list posamezni.

You need to get the attribute from the items of the list, instead of finding it again.

The following returns the list, and you don't need to click on the cookie banner to access this information.

URL = "https://www.nepremicnine.net/oglasi-prodaja/gorenjska/kranj/kranj/stanovanje/letnik-od-1980-do-1989/"
driver.open(URL)

posamezni = driver.find_elements(By.CSS_SELECTOR, 'div.property-details')
title = []
for one in posamezni:
    # title = one.find_element(By.XPATH, "//div[@class='property-details']") this is the same selector as the one used to populate posamezni. That is why it only returns the first result.
    title1 = one.get_attribute('data-href')
    print(title1)

Comments

0

To fix your code simple, add a dot on the xpath when you do over an element:

    title = one.find_element(By.XPATH, "//div[@class='propertydetails']") #OLD

    title = one.find_element(By.XPATH, ".//div[@class='propertydetails']") #NEW




This will consider element (node) as father for xpath

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.