2

Consider a html page with 3 tables in it.

I want to loop through each table and at the same time to print something along if the content coresponds to something I want.

I need to keep track of the table I'm at.

As you see in the code below I have the page variable which is a html string.

I can return the content in all the tables at once(in an array).

I'd like to loop through them.

import __future__
from lxml import html
import requests
from bs4 import BeautifulSoup

page = """
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>cv</title>
</head>
<body>

    <table>
        <tr>
            <td>table1 td1</td>
            <td>table1 td2</td>
        </tr>
    </table>

    <table>
        <tr>
            <td>table2 td1</td>
            <td>table2 td2</td>
        </tr>
    </table>

    <table>
        <tr>
            <td>table3 td1</td>
            <td>table3 td2</td>
        </tr>
    </table>

</body>
</html>
"""

soup = str(BeautifulSoup(page, 'html.parser'))

tree = html.fromstring(soup)

tds = tree.xpath('//table/tr/td/text()')

for td in tds:
    print(td + '\n')

print('Ready !!')

1 Answer 1

1

You mean you need to process each table on its own?

for table in tree.xpath(".//table"):
    print("---  new table: ---")
    for td in table.xpath(".//td"):
        print(td)
Sign up to request clarification or add additional context in comments.

4 Comments

It works, but what does that dot in front of the slashes mean? .// If I don't use it, the output is not desired.
it means that the xpath expression is evaluated relative to the current context node, .//td means to search for all descendants of the current node (it's a short form of descendant::td). An expression starting with a / always starts at the root node of the document, so //td selects all td nodes in the document, regardless of the context node.
Now I have another html document which has a lot of children and sub-chidlren elements. I want to write the abosulte path to it, but I can't write all the sub-children because they are too many.. Consider the example above, and imagine the <td> tags would have had a lot of parents/grand-parents... What would be the correct xpath to each td: xpath("/table.//td") - is not working. How can I retrieve the td from the context
You should ask a new question with the exact problem description and example input / expected output, comments are not the right place to ask new questions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.