0

I have the following html code:

<div>
    <span class="test">
     <span class="f1">
      5 times
     </span>
    </span>

    </span>
   </div>

<div>

</div>

<div>
    <span class="test">
     <span class="f1">
      6 times
     </span>
    </span>

    </span>
   </div>

I managed to navigate the tree, but when trying to print I get the following error:

AttributeError: 'list' object has no attribute 'text'

Python code working:

x=soup.select('.f1')
print(x) 

gives the following:

[]
[]
[]
[]
[<span class="f1"> 19 times</span>]
[<span class="f1"> 12 times</span>]
[<span class="f1"> 6 times</span>]
[]
[]
[]
[<span class="f1"> 6 times</span>]
[<span class="f1"> 1 time</span>]
[<span class="f1"> 11 times</span>]

but print(x.prettify) throws the error above. I am basically trying to get the text between the span tags for all instances, blank when none and string when available.

1
  • shouldn't it throw: AttributeError: 'list' object has no attribute 'prettify' ? Commented Oct 9, 2018 at 12:39

3 Answers 3

1

select() returns a list of the results, regardless if the result has 0 items. Since list object does not have a text attribute, it gives you the AttributeError.

Likewise, prettify() is to make the html more readable, not a way to interpret the list.

If all you're looking to do is extract the texts when available:

texts = [''.join(i.stripped_strings) for i in x if i]

# ['5 times', '6 times']

This removes all the superfluous space/newline characters in the string and give you just the bare text. The last if i indicates to only return the text if i is not None.

If you actually care for the spaces/newlines, do this instead:

texts  = [i.text for i in x if i]

# ['\n      5 times\n     ', '\n      6 times\n     ']
Sign up to request clarification or add additional context in comments.

Comments

0
from bs4 import BeautifulSoup
html = '''<div>
    <span class="test">
     <span class="f1">
      5 times
     </span>
    </span>
    </span>
   </div>
<div>
</div>
<div>
    <span class="test">
     <span class="f1">
      6 times
     </span>
    </span>
    </span>
   </div>'''


soup = BeautifulSoup(html, 'html.parser')
aaa = soup.find_all('span', attrs={'class':'f1'})
for i in aaa:
    print(i.text)

Output:

5 times
6 times

Comments

0

I'd recommend you using .findAll method and loop over matched spans.

Example:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'lxml')

for span in soup.findAll("span", class_="f1"):
    if span.text.isspace():
        continue
    else:
        print(span.text)

The .isspace() method is checking whether a string is empty (checking if a string is True won't work here since an empty html span cointans spaces).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.