2

I'm using Beautiful Soup 4 to extract text from HTML files, and using get_text() I can easily extract just the text, but now I'm attempting to write that text to a plain text file, and when I do, I get the message "416." Here's the code I'm using:

from bs4 import BeautifulSoup
markup = open("example1.html")
soup = BeautifulSoup(markup)
f = open("example.txt", "w")
f.write(soup.get_text())

And the output to the console is 416 but nothing gets written to the text file. Where have I gone wrong?

5
  • 1
    you need to close the file Commented Apr 26, 2013 at 16:51
  • alternatively you can use, in 2.5+, the with statement to have that handled for you Commented Apr 26, 2013 at 16:52
  • Have you tried inspecting soup and soup.get_text()? Commented Apr 26, 2013 at 17:04
  • right, I wasn't closing the file - rookie mistake Commented Apr 26, 2013 at 17:05
  • 1
    416 can be the returned value from f.write() (the number of bytes written). The writes are buffered by default; flush (application) buffers (f.flush()) or close the file (f.close() or use with-statement that does it for you) to be able to see something in the file outside the Python process. Note: it doesn't ensure that the data is actually saved (physically) to disk depending on your OS, filesystem, hdd it may take a while (usually it doesn't matter unless there is a power failure). os.fsync() might flush OS buffers (usage example). Commented Apr 26, 2013 at 17:49

1 Answer 1

5

You need to send text to the BeautifulSoup class. Maybe try markup.read()

from bs4 import BeautifulSoup
markup = open("example1.html")
soup = BeautifulSoup(markup.read())
markup.close()
f = open("example.txt", "w")
f.write(soup.get_text())
f.close()

and in a more pythonic style

from bs4 import BeautifulSoup

with open("example1.html") as markup:
    soup = BeautifulSoup(markup.read())

with open("example.txt", "w") as f: 
    f.write(soup.get_text())

as @bernie suggested

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.