I want to get all my that's inside . I wrote this code:
matchObj = re.search(r'<tr>(.*?)</tr>', txt, re.M|re.I|re.S)
but I only get the first group.
how can I get all groups?
Thanks in advance :)
I want to get all my that's inside . I wrote this code:
matchObj = re.search(r'<tr>(.*?)</tr>', txt, re.M|re.I|re.S)
but I only get the first group.
how can I get all groups?
Thanks in advance :)
findall
matchObj = re.findall(r'<tr>(.*?)</tr>', txt, re.M|re.I|re.S)
search only finds the first one in the given string.
you can read more about the different methods you can use in regex.
however, it looks like you are parsing HTML. why don't you use an HTMl parser?
BeautifulSoup or somethingTo get more than one match use re.findall().
However, using regular expressions to parse HTML is going to get ugly and complicated fast. Use a proper HTML parser instead.
Python has several to choose from:
ElementTree example:
from xml.etree import ElementTree
tree = ElementTree.parse('filename.html')
for elem in tree.findall('tr'):
print ElementTree.tostring(elem)
BeautifulSoup example:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('filename.html'))
for row in soup.select('table tr'):
print row