1

I'm trying to pull out data using BeautifulSoup library in python. I used zip and soup to extract.

My html data looks like this :

<li>

    <ul class="features">

        <li>Year: <strong>2016</strong></li>

        <li>Kilometers: <strong>81,000</strong></li>

    </ul>
    <ul class="features">

        <li>Doors: <strong>2 door</strong></li>

        <li>Color: <strong>White</strong></li>

    </ul>
    <ul class="features">

    </ul>

</li>

Here i want to get year,kilometers,doors,color in seperate variables. But when i run my code it getting together.

My code :


for title, price, date, features  in zip(soup.select('.listing-item .title'),
                            soup.select('.listing-item .price'),
                            soup.select('.listing-item .date'),
                            soup.select('.listing-item .features')):


    title = title.get_text().strip()
    price = price.get_text().strip()
    date = date.get_text().strip()
    features = features.get_text().strip()

    print(features)


Output :

Year: 2016
Kilometers: 81,000
Doors: 2 door
Color: White

How i can store the year,kilometers,doors,colors in seperate variables ?

2 Answers 2

1

You can try:

from bs4 import BeautifulSoup as bs
from io import StringIO

data = """<li>
    <ul class="features">
        <li>Year: <strong>2016</strong></li>
        <li>Kilometers: <strong>81,000</strong></li>
    </ul>
    <ul class="features">
        <li>Doors: <strong>2 door</strong></li>
        <li>Color: <strong>White</strong></li>
    </ul>
    <ul class="features">
    </ul>
</li>"""

soup = bs(StringIO(data))
Year, Km, Doors, Color = list(map(lambda x: x.text.split(':')[1].strip(), soup.select('.features > li')))
print(Year, Km, Doors, Color)
Sign up to request clarification or add additional context in comments.

Comments

0

find element li which contains text and then find next strong tag. Declare empty list and append.

Code.

from bs4 import BeautifulSoup

html='''<li>

    <ul class="features">

        <li>Year: <strong>2016</strong></li>

        <li>Kilometers: <strong>81,000</strong></li>

    </ul>
    <ul class="features">

        <li>Doors: <strong>2 door</strong></li>

        <li>Color: <strong>White</strong></li>

    </ul>
    <ul class="features">

    </ul>

</li>
'''
soup=BeautifulSoup(html,'html.parser')
Year=[]
KiloMeter=[]
Doors=[]
Color=[]
for year,km,dor,colr in zip(soup.select('ul.features li:contains("Year:")'),soup.select('ul.features li:contains("Kilometers:")'),soup.select('ul.features li:contains("Doors:")'),soup.select('ul.features li:contains("Color:")')):
    Year.append(year.find_next('strong').text)
    KiloMeter.append(km.find_next('strong').text)
    Doors.append(dor.find_next('strong').text)
    Color.append(colr.find_next('strong').text)

print(Year,KiloMeter,Doors,Color)

Output: list

['2016'] ['81,000'] ['2 door'] ['White']

4 Comments

i don't want to add it into array so i can do like kilometer = km.find_next('strong').text i'm i correct ?
If you don't use in array then if you have more than one elements the variable value will change and you will always get the last value of the element.
but i'm getting all years printed
Well if You want to print first value then do end of for loop. Year=''.join(Year[:1]) print(Year) do the same for others

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.