I would like to extract data from a website, and I need to know if it contains some of the equipment. As the example below, I know A has CD, but he doesn't have CDA.
HTML:
<div class="ABC">
<h3>A</h3>
<ul>
<li class="specChecked"><p>CD</p></li>
<li class="specChecked"><p>VCD</p></li>
<li class=""><p>CDA</p></li>
</ul>
<h3>B</h3>
<div class="buyCarDetailContentSpecContent ">
<ul>
<li>
<p>b1<span>1</span></p>
</li>
<li>
<p>b2<span>2</span></p>
</li>
</ul>
</div>
</div>
My code:
res = requests.get('https://www.acd.com/carinfo-4434.php')
soup=BeautifulSoup(res.text,'lxml')
for item in soup.find_all(attrs={'class':'ABC'}):
for link in item.find_all('li'):
print(link)
From my code, I will extract all the li from the HTML, like this:
<li class="specChecked"><p>CD</p></li>
<li class="specChecked"><p>VCD</p></li>
<li class=""><p>CDA</p></li>
<li>
<p>b1<span>1</span></p>
</li>
<li>
<p>b2<span>2</span></p>
</li>
But that's not what I want. What I wanna do, is to extract from "li class" and text, the hope the result will be like this:
specChecked, CD
specChecked, VCD
, CDA
(Or maybe I can just replace specChecked as 1 and blank space as 0)