I have a text in html, which later I want to convert into a pandas dataframe.
I have a text that looks like so:
<tr>
<td -some attributes- >Val1</td>
<td -some attributes- >Val2</td>
<td -some attributes- >Val3</td>
</tr>
<tr>
<td -some attributes- >Val4</td>
<td -some attributes- >Val5</td>
<td -some attributes- >Val6</td>
</tr>
and I have the regex: <td.*>(.*)</td> but it doesn't catches all the values, it cathces almost all the text...
after I ctach all, I put it in a dataframe.
so why this regex doesn't catch the values as it should?

<td.*>(.*)</td>is greedy (see (documentation](docs.python.org/3.6/library/re.html) ). So it captures more than necessary.