Python playwright locator not returning expected value

Question

I'm not getting the expected value returned from the below code.

from playwright.sync_api import sync_playwright
import time
import random

def main():
    with sync_playwright() as p:
        browser = p.firefox.launch(headless=False)
        page = browser.new_page()
        url = "https://www.useragentlist.net/"
        page.goto(url)
        time.sleep(random.uniform(2,4))

        test = page.locator('xpath=//span[1][@class="copy-the-code-wrap copy-the-code-style-button copy-the-code-inside-wrap"]/pre/code/strong').inner_text()
        print(test)

        count = page.locator('xpath=//span["copy-the-code-wrap copy-the-code-style-button copy-the-code-inside-wrap"]/pre/code/strong').count()
        print(count)


        browser.close()


if __name__ == '__main__':
    main()

page.locator().count() returns a value of 0, I have no issue getting the text from the lines above it, but I need to access all elements, what is wrong with my implementation of locator and count?

The random sleep is almost always necessary to avoid being flagged as a bot, so I always add it as a preventative measure, easier to do that first than to have to add it later — Jacob
– Jacob, Commented Aug 7, 2024 at 22:12
True, sleeps can sometimes help with that, although I usually only attempt that after I've been flagged, otherwise it's a big performance hit. Adding a user agent is probably a better initial preventative step that doesn't incur a performance penalty--Playwright's default user agent says "I am a bot" essentially. Also, in this case, you're not actually interacting with the page, just visiting it and then leaving. — ggorlen
– ggorlen, Commented Aug 7, 2024 at 22:16
There are two reasons I have it setup like this. First, I'm going to use this code as a template while I migrate my projects from selenium to playwright. Second, I'll use the data I get from this website to generate a random user agent each time I run the other web scrapers — Jacob
– Jacob, Commented Aug 7, 2024 at 22:22
You might consider using a library that generates a random user agent which is a bit faster and more reliable (the user agent site might have downtime, causing a disruption). — ggorlen
– ggorlen, Commented Aug 7, 2024 at 22:24

ggorlen · Accepted Answer · 2024-08-07 22:26:39Z

1

Your second locator XPath has no @class=, so it's different than the first one that works. Store the string in a variable so you don't have to type it twice or encounter copy-paste or stale data errors.

In any case, your approach seems overcomplicated. Each user agent is in a <code> tag--just scrape that:

from playwright.sync_api import sync_playwright # 1.44.0


def main():
    with sync_playwright() as p:
        browser = p.firefox.launch()
        page = browser.new_page()
        url = "https://www.useragentlist.net/"
        page.goto(url, wait_until="domcontentloaded")
        agents = page.locator("code").all_text_contents()
        print(agents)
        browser.close()


if __name__ == "__main__":
    main()

Locators auto-wait so there's no need to sleep. Avoid XPaths 99% of the time--they're brittle and difficult to read and maintain. Just use CSS selectors or user-visible locators. The goal is to choose the simplest selector necessary to disambiguate the elements you want, and nothing more. span/pre/code/strong is a rigid hierarchy--if one of these changes, your code breaks unnecessarily.

By the way, the user agents are in the static HTML, so unless you're trying to circumvent a block, you can do this faster with requests and Beautiful Soup:

from requests import get  # 2.31.0
from bs4 import BeautifulSoup  # 4.10.0

response = get("https://www.useragentlist.net")
response.raise_for_status()
print([x.text for x in BeautifulSoup(response.text, "lxml").select("code")])

Better still (possibly), use a library like fake_useragent to generate your random user agent.

edited Aug 7, 2024 at 22:26

answered Aug 7, 2024 at 21:13

ggorlen

59.3k8 gold badges119 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jacob Over a year ago

Yeah, that fixed it. Almost every time I use xpaths it's a typo or other minor error that causes a catastrophic failure

Collectives™ on Stack Overflow

Python playwright locator not returning expected value

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related