12

I have encoding problems when serving a simple web page in python3, using BaseHTTPRequestHandler.

Here is a working example:

#!/usr/bin/python3
# -*- coding: utf-8 -*

from http.server import BaseHTTPRequestHandler, HTTPServer
from os import curdir, sep, remove
import cgi

HTML_FILE_NAME = 'test.html'
PORT_NUMBER = 8080

# This class will handles any incoming request from the browser
class myHandler(BaseHTTPRequestHandler):

    # Handler for the GET requests
    def do_GET(self):
        self.path = HTML_FILE_NAME
        try:
            with open(curdir + sep + self.path, 'r') as f:
                self.send_response(200)
                self.send_header('Content-type', 'text/html')
                self.end_headers()
                self.wfile.write(bytes(f.read(), 'UTF-8'))
            return
        except IOError:
            self.send_error(404, 'File Not Found: %s' % self.path)

try:
    # Create a web server and define the handler to manage the incoming request
    with open(HTML_FILE_NAME, 'w') as f:
        f.write('<!DOCTYPE html><html><body> <p> My name is Jérôme </p> </body></html>')
    print('Started httpserver on port %i.' % PORT_NUMBER)

    #Wait forever for incoming http requests
    HTTPServer(('', PORT_NUMBER), myHandler).serve_forever()

except KeyboardInterrupt:
    print('Interrupted by the user - shutting down the web server.')
    server.socket.close()
    remove(HTML_FILE_NAME)

The expected result is to serve a web page displaying My name is Jérôme.

Instead, I have: My name is Jérôme

As you can see, the html page is correctly encoded, with self.wfile.write(bytes(f.read(), 'UTF-8')), so I think the problem comes from the web server.

How to tell the web server to serve the page in UTF-8?

2 Answers 2

12

Your web server is already sending the text encoded to UTF-8 but you need to tell your browser the encoding of the bytes it receives. The HTTP spec. declares ISO-8995-1 as the default.

The HTTP standard way of doing is this is to tag the Content-type header value with a charset sub-key.

Therefore, you should change your code to read:

self.send_header('Content-type', 'text/html; charset=utf-8')

Also, watch out for the encoding of your HTML file. Without an encoding given to open(), it'll be guessed based on your locale. This won't break anything, unless you end up running this script where the locale is C, POSIX or non-latin Windows.

Sign up to request clarification or add additional context in comments.

1 Comment

the *Also hint saved my day, I added a reference :)
11

No problem if I add:

<meta content="text/html;charset=utf-8" http-equiv="Content-Type">
<meta content="utf-8" http-equiv="encoding">

in my html head.

1 Comment

use header is better for instance source code like .js can't have <meta tag>

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.