I have this list
[<th align="left">
<a href="blablabla">F</a>ojweousa</th>,
<th align="left">
<a href="blablabla">S</a>awdefrgt</th>, ...]
and want
the one single character after
">the multiple characters between
</a>and</th>,
to be concatenated so that i can move on with my life.
Here is my code
item2 = []
for element in items2:
first_letter = re.search('">.</a', str(items2))
second_letter = re.search(r'</a>[a-zA-Z0-9]</th>,', str(items2))
item2.append([str(first_letter) + str(second_letter)])
I know i should do something like item2.group or item2.join but if i do, the output gets even more messy. Here is the output with the current code
[['<re.Match object; span=(155, 161), match=\'">F</a\'>None'],
['<re.Match object; span=(155, 161), match=\'">F</a\'>None'],
...]]
I would like the output to look like this so that i can use it in pd.dataframe:
[Fojweousa, Sawdefrgt, ...]
It is a list, that is why i cant use html bs4 or select methods.
item2 = [re.sub(r'<[^>]*>', '', x).strip() for x in items2]. But using BeautifulSoup would be the best solution, where you may strip tags like this.items2 = table.find_all('th', attrs={'align': 'left'})[1:]I cannot combine two bs4 methods likeget_text()andfind_all()Every time i do onefind_all()i get lists and afterwards need to rely on regex. Which is annoyingresult = [x.get_text() for x in table.find_all('th', attrs={'align': 'left'})[1:]]