0

It's easy for us to get text by xpath, but is there any way to get xpath by text in Python?

eg.

 <html><h1>Hello World</h1></html>

how to get xpath by Hello World?

2 Answers 2

6

For the same problem i used this function. Hope this general example will help you.

you have to define the function from the given url:

def xpath_soup(element):
    """
    Generate xpath of soup element
    :param element: bs4 text or node
    :return: xpath as string
    """
    components = []
    child = element if element.name else element.parent
    for parent in child.parents:
        """
        @type parent: bs4.element.Tag
        """
        previous = itertools.islice(parent.children, 0,parent.contents.index(child))
        xpath_tag = child.name
        xpath_index = sum(1 for i in previous if i.name == xpath_tag) + 1
        components.append(xpath_tag if xpath_index == 1 else '%s[%d]' % (xpath_tag, xpath_index))
        child = parent
    components.reverse()
    return '/%s' % '/'.join(components)

then on python intepreter, run:

>>> import re
>>> import itertools
>>> from bs4 import BeautifulSoup
>>> html = '<html><body><div><p>Hello World</p></div></body></html>'
>>> soup = BeautifulSoup(html, 'lxml')
>>> elem = soup.find(string=re.compile('Hello World'))
>>> xpath_soup(elem)
'/html/body/div/p'

and you have the xpath of the given text

Sign up to request clarification or add additional context in comments.

Comments

3

You can use contains()

  1. if you want get the element by using the text inside a tag(Example: h1) use
xpath('//h1[contains(text(),"Hello World")]')

2.If you want to get all the elements that contains text 'Hello World' use

xpath('//*[contains(text(),"Hello World")]')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.