I'm trying to extract texts from this webpage below:
<div class="MYCLASS">Category1: <a id=category1 href="SomeURL" >
Text1 I want</a> > Category2: <a href="SomeURL" >Text2 I want</a></div>
I tried:
for div in soup.find_all('div', class_='MYCLASS'):
for url in soup.find_all('a', id='category1'):
print(url)
And it returned:
<a href="someURL" id="category1">Text1 I want</a>
So I split the text...
for div in soup.find_all('div', class_='MYCLASS'):
for url in soup.find_all('a', id='category1'):
category1 = str(url).split('category1">')[1].split('</a>')[0]
print(category1)
and extracted "Text1 I want", but still miss "Text2 I want". Any idea? Thank you.
EDIT:
There are other < a> < /a> in the source code, so if I remove id= from my code, it would return all other texts that I don't need. For examples,
<div class="MYClass"><span class="Class">RandomText.<br>RandomText.<br>
<a href=someURL>RandomTextExtracted.</a><br>
Also,
</div><div class=MYClass>
<a href="SomeURL>RandomTextExtracted</a>
category1butText2 I wantdoesn't have that id.id='category1'from thefind_all().