18

I need save the HTML code of any website in a txt file, is a very easy exercise but I have doubts with this because a have a function that do this:

import urllib.request

def get_html(url):
    f=open('htmlcode.txt','w')
    page=urllib.request.urlopen(url)
    pagetext=page.read() ## Save the html and later save in the file
    f.write(pagetext)
    f.close()

But this doesn't work.

9
  • 1
    You can ask your browser to save the HTML for a page. Why do it this way? There are programs like wget (on Unix/Linux, probably also on OSX, and also on Windows as part of CygWin) that can download a complete website. Commented Jun 19, 2014 at 1:09
  • 1
    Lots of programmers use python to download urls. I do. I guess I could hire a bunch of people to click save from the browser. I could send them email telling them which pages I want. But python is less expensive. Commented Jun 19, 2014 at 1:13
  • I had a strange error, say something like: "No str, needed bytes" Commented Jun 19, 2014 at 1:41
  • Great! The problem is that you need to convert the buffer to a string form. Pagetext=page.read().decode() is probably all you need. This gives you UTF8. Commented Jun 19, 2014 at 2:03
  • Yes, your right! Finally I get it, thanks for all :D Commented Jun 19, 2014 at 2:20

2 Answers 2

27

Easiest way would be to use urlretrieve:

import urllib

urllib.urlretrieve("http://www.example.com/test.html", "test.txt")

For Python 3.x the code is as follows:

import urllib.request    
urllib.request.urlretrieve("http://www.example.com/test.html", "test.txt")
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I have done the next way, and working: import urllib2 def Obtener_Html(url): file("my_file.txt", "w").write(urllib2.urlopen(url).read()) if name == 'main': url=raw_input("Say me a website: ") Obtener_Html("http://"+url)
11

I use Python 3.
pip install requests - after install requests library you can save a webpage in txt file.

import requests

url = "https://stackoverflow.com/questions/24297257/save-html-of-some-website-in-a-txt-file-with-python"

r = requests.get(url)
with open('file.txt', 'w') as file:
    file.write(r.text)

1 Comment

Might want to also check status_code to make sure that you are not running into http 404 or some server error. It should be http 200, ok=true

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.