extract string from html tag with beautiful soup

Question

I have a html file like this in a subdirectory the_files

<div class='log'>start</div>
<div class='ts'>2017-03-14 09:17:52.859 +0800&nbsp;</div><div class='log'>bla bla bla</div>
<div class='ts'>2017-03-14 09:17:55.619 +0800&nbsp;</div><div class='log'>aba aba aba</div>
...
...

I want to extract the string in each tag and print it like this on terminal

2017-03-14 09:17:52.859 +0800 , bla bla bla
2017-03-14 09:17:55.619 +0800 , aba aba aba
...
...

I want to ignore the first line of <div class='log'>start</div>.

My code so far

from bs4 import BeautifulSoup

path = "the_files/"
def do_task_html():
    dir_path = os.listdir(path)
    for file in dir_path:
        if file.endswith(".html"):
            soup = BeautifulSoup(open(path+file))
            item1 = [element.text for element in soup.find_all("div", "ts")]
            string1 = ''.join(item1)
            item2 = [element.text for element in soup.find_all("div", "log")]
            string2 = ''.join(item2)
            print string1 + "," + string2

This code produces result as follows

2017-03-14 09:17:52.859 +0800 2017-03-14 09:17:55.619 +0800 , start bla bla bla  aba aba aba ... ...

Is there a way to fix this?

Thank you for your help.

Zroq · Accepted Answer · 2017-03-24 10:54:04Z

2

Fetch each div by class get its text and their next_sibling text.

for div in soup.find_all("div", class_="ts"):
    print ("%s, %s") % (div.get_text(strip=True), div.next_sibling.get_text(strip=True))

Outputs:

2017-03-14 09:17:52.859 +0800, bla bla bla
2017-03-14 09:17:55.619 +0800, aba aba aba

answered Mar 24, 2017 at 10:54

Zroq

8,4724 gold badges29 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ling Over a year ago

thank you for your fast response and great answer. It worked!

Collectives™ on Stack Overflow

extract string from html tag with beautiful soup

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related