56

I have problem with HTTP headers, they're encoded in ASCII and I want to provided a view for downloading files that names can be non ASCII.

response['Content-Disposition'] = 'attachment; filename="%s"' % (vo.filename.encode("ASCII","replace"), )

I don't want to use static files serving for same issue with non ASCII file names but in this case there would be a problem with File system and it's file name encoding. (I don't know target os.)

I've already tried urllib.quote(), but it raises KeyError exception.

Possibly I'm doing something wrong but maybe it's impossible.

1
  • 2
    I realise I'm years late, but ... the KeyError exception really bugs me. I don't just mean "every once in awhile I run into this problem," I mean, I submitted a patch to Python to fix this years ago, argued for awhile, then decided they didn't want to change Python 2. I did fix this problem in Python 3, but they never accepted my patch in Python 2. The work-around is to .encode('utf-8') first, and then use urllib.quote. But that's for URL-encoding which isn't the standard way to put these in headers. Commented Apr 11, 2011 at 0:36

7 Answers 7

35

This is a FAQ.

There is no interoperable way to do this. Some browsers implement proprietary extensions (IE, Chrome), other implement RFC 2231 (Firefox, Opera).

See test cases at http://greenbytes.de/tech/tc2231/.

Update: as of November 2012, all current desktop browsers support the encoding defined in RFC 6266 and RFC 5987 (Safari >= 6, IE >= 9, Chrome, Firefox, Opera, Konqueror).

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks! Easiest things are the hardest to find ;)
More recently, Julian has put together a profile of RFC2231 for this purpose: datatracker.ietf.org/doc/draft-reschke-rfc2231-in-http
Does this apply for multipart/form-data support, because right now I can see raw UTF-8 bytes sent in 'filename' parameter when uploading a file from a form in Chrome
RFC 5987 as been obsoleted by RFC 8187
|
32

Don't send a filename in Content-Disposition. There is no way to make non-ASCII header parameters work cross-browser(*).

Instead, send just “Content-Disposition: attachment”, and leave the filename as a URL-encoded UTF-8 string in the trailing (PATH_INFO) part of your URL, for the browser to pick up and use by default. UTF-8 URLs are handled much more reliably by browsers than anything to do with Content-Disposition.

(*: actually, there's not even a current standard that says how it should be done as the relationships between RFCs 2616, 2231 and 2047 are pretty dysfunctional, something that Julian is trying to get cleared up at a spec level. Consistent browser support is in the distant future.)

2 Comments

The top answer contains some great information, but you've actually solved the problem. Thanks!
Since this answer has come out, an RFC on this topic has been issued. Of note is the filename*= construct which only newer browsers support and is guaranteed to let you use UTF-8, encoded as in RFC 5987. tools.ietf.org/html/rfc6266#appendix-D
32

Note that in 2011, RFC 6266 (especially Appendix D) weighed in on this issue and has specific recommendations to follow.

Namely, you can issue a filename with only ASCII characters, followed by filename* with a RFC 5987-formatted filename for those agents that understand it.

Typically this will look like filename="my-resume.pdf"; filename*=UTF-8''My%20R%C3%A9sum%C3%A9.pdf, where the Unicode filename ("My Résumé.pdf") is encoded into UTF-8 and then percent-encoded (note, do NOT use + for spaces).

Please do actually read RFC 6266 and RFC 5987 (or use a robust and tested library that abstracts this for you), as my summary here is lacking in important detail.

1 Comment

This is what I needed for a file download endpoint in my Django project. Thank you!
20

Starting with Django 2.1 (see issue #16470), you can use FileResponse, which will correctly set the Content-Disposition header for attachments. Starting with Django 3.0 (issue #30196) it will also set it correctly for inline files.

For example, to return a file named my_img.jpg with MIME type image/jpeg as an HTTP response:

response = FileResponse(open("my_img.jpg", 'rb'), as_attachment=True, content_type="image/jpeg")
return response

Starting with Django 4.2 you also have the option of setting the content disposition header with a utility function, for example

response.headers["Content-Disposition"] = content_disposition_header(...)

Django < 2.1

If you can't use FileResponse, you can use the relevant part from FileResponse's source to set the Content-Disposition header yourself. Here's what that source currently looks like:

from urllib.parse import quote

disposition = 'attachment' if as_attachment else 'inline'
try:
    filename.encode('ascii')
    file_expr = 'filename="{}"'.format(filename)
except UnicodeEncodeError:
    file_expr = "filename*=utf-8''{}".format(quote(filename))
response.headers['Content-Disposition'] = '{}; {}'.format(disposition, file_expr)

3 Comments

NOTE: if as_attachment=False (if Content-Disposition is inline) it is not available in either version Django 2.1 or version Django 2.2, now (21.05.2019) it is in the Django dev version, so for the inline I use the manual version.
For more info about @don_vanchos's comment, see Django issue #30196.
Django 4.2 broke this logic out into a standalone function: django.utils.http.content_disposition_header()
11

I can say that I've had success using the newer (RFC 5987) format of specifying a header encoded with the e-mail form (RFC 2231). I came up with the following solution which is based on code from the django-sendfile project.

import unicodedata
from django.utils.http import urlquote

def rfc5987_content_disposition(file_name):
    ascii_name = unicodedata.normalize('NFKD', file_name).encode('ascii','ignore').decode()
    header = 'attachment; filename="{}"'.format(ascii_name)
    if ascii_name != file_name:
        quoted_name = urlquote(file_name)
        header += '; filename*=UTF-8\'\'{}'.format(quoted_name)

    return header

# e.g.
  # request['Content-Disposition'] = rfc5987_content_disposition(file_name)

I have only tested my code on Python 3.4 with Django 1.8. So the similar solution in django-sendfile may suite you better.

There's a long standing ticket in Django's tracker which acknowledges this but no patches have yet been proposed afaict. So unfortunately this is as close to using a robust tested library as I could find, please let me know if there's a better solution.

1 Comment

Awesome! The that need to!
3

The escape_uri_path function from Django is the solution that worked for me.

Read the Django Docs here to see which RFC standards are currently specified.

from django.utils.encoding import escape_uri_path

file = "response.zip"
response = HttpResponse(content_type='application/zip')
response['Content-Disposition'] = f"attachment; filename*=utf-8''{escape_uri_path(file)}"

Comments

-1

A hack:

if (Request.UserAgent.Contains("IE"))
{
  // IE will accept URL encoding, but spaces don't need to be, and since they're so common..
  filename = filename.Replace("%", "%25").Replace(";", "%3B").Replace("#", "%23").Replace("&", "%26");
}

1 Comment

User-agent sniffing stinks in general, these buggy servers use it and are responsible for a lot of the tc2231/rfc6266 test cases.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.