0

I am using this code: BeautifulSoup on multiple .html files This code is saving extratced text into .txt files. I want to save each record extracted in DataFrame as a separate row.

I want to save the results into DataFrame as a single column as "file". How to achieve the same?

import glob
import os.path
from bs4 import BeautifulSoup
dir_path = r"C:\My_folder\tmp"
results_dir = r"C:\My_folder\tmp\working"

for file_name in glob.glob(os.path.join(dir_path, "*.html")):
    with open(file_name) as html_file:
        soup = BeautifulSoup(html_file)

    results_file = os.path.splitext(file_name)[0] + '.txt'
    with open(results_file, 'w') as outfile:        
        for i in soup.select('font[color="#FF0000"]'):
            print(i.text)
            outfile.write(i.text + '\n')
3
  • 3
    Can you please provide the code that you tried to use to solve this so far? We need to see what you tried to be able to help you. :) Commented Apr 9, 2019 at 11:02
  • I have attached the code now. Commented Apr 9, 2019 at 11:45
  • stackoverflow.com/questions/31674557/… Commented Apr 9, 2019 at 11:52

1 Answer 1

0

You could create an empty dataframe at the beginning of your code, and then append to it row by row within the loop.

df = pd.DataFrame(columns=['columname'])

Then in your loop (at the place where print(i.text) is at the moment), you could use:

dataframe.append(i.text))


Or a possibility is to create a list, add all i.text to the list and then turn that into a df by using:

df = pd.DataFrame({'columname':created_list})

Sign up to request clarification or add additional context in comments.

1 Comment

Great. Thanks a lot. Second one worked (created_list)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.