Extracting Content Within Multiple Span Tags in BeautifulSoup

Question

I'm trying to extract string content from and within multiple span tags. A snap shot of the HTML page is:

<div class="secondary-attributes">
    <span class="neighborhood-str-list">
        Southeast
    </span>
    <address>
        1234 Python Blvd S<br>Somewhere, NV 98765
    </address>
    <span class="biz-phone">
        (555) 123-4567
    </span>
</div>

Specifically, I'm trying to extract the phone number, nestled in between the <span class="biz-phone></span> tags. I attempted to do so with the following code:

import requests
from bs4 import BeautifulSoup

res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")

phone_number_results = [phone_numbers for phone_numbers in soup.find_all('span','biz-phone')]

The code compiled without any syntax errors, but it didn't quite give me the result I was hoping for:

['<span class="biz-phone">\n        (702) 476-5050\n    </span>', '<span class="biz-phone">\n        (702) 253-7296\n    </span>', '<
span class="biz-phone">\n        (702) 385-7912\n    </span>', '<span class="biz-phone">\n        (702) 776-7061\n    </span>', '<spa
n class="biz-phone">\n        (702) 221-7296\n    </span>', '<span class="biz-phone">\n        (702) 252-7296\n    </span>', '<span c
lass="biz-phone">\n        (702) 659-9101\n    </span>', '<span class="biz-phone">\n        (702) 355-9445\n    </span>', '<span clas
s="biz-phone">\n        (702) 396-3333\n    </span>', '<span class="biz-phone">\n        (702) 643-9851\n    </span>', '<span class="

biz-phone">\n        (702) 222-1441\n    </span>']

My question has two parts:

Why are the span tags appearing when I run the program?
How do I get rid of them? I could just do string editing, but I feel like I wouldn't be taking full advantage of the BeautifulSoup package. Is there a more elegant way?

NOTE: there are more snippets of HTML code like the one shown above throughout the page; there are more instances of the <span class="biz-phone"> (555) 123-4567 </span> code (i.e., more phone numbers) that need to be extracted, hence why I was thinking of using find_all().

Thank you in advance.

use phone_numbers.text or even phone_numbers.text.strip() — furas
– furas, Commented Oct 30, 2016 at 20:50

dmcc · Accepted Answer · 2016-10-30 20:53:58Z

2

find_all() returns a list of tags (bs4.element.Tag), not strings.
As @furas points out, you want to access the text property on each of the tags to extract the text within the tag:

phone_number_results = [phone_numbers.text.strip() for phone_numbers in soup.find_all('span', 'biz-phone')]

(you may also want to call strip() on top of that)

answered Oct 30, 2016 at 20:53

dmcc

2,52929 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

daOnlyBG Over a year ago

Thank you, the .text did the trick! I wasn't aware of that property- I tried a few others (i.e., .contents) but that didn't seem to help. Your solution worked, though.

Collectives™ on Stack Overflow

Extracting Content Within Multiple Span Tags in BeautifulSoup

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related