Same URL request fails with python->urllib but not with curl

Question

I want to retrieve JSON data from the MyMemory web API (using python).

Here is a simple request for this API:

http://api.mymemory.translated.net/get?langpair=en|fr&q=something+to+translate

When I try to get the data with my browser or with curl, it works well, giving me back the JSON object, like this:

{"responseData":{"translatedText":"quelque chose \u00e0 traduire","match":0.85},"responseDetails":"","responseStatus":200,"responderId":"239","matches":[{"id":0,"segment":"something to translate","translation":"quelque chose \u00e0 traduire","quality":70,"reference":"Machine Translation provided by Google, Microsoft, Worldlingo or MyMemory customized engine.","usage-count":1,"subject":false,"created-by":"MT!","last-updated-by":"MT!","create-date":"2014-12-31 02:47:09","last-update-date":"2014-12-31 02:47:09","tm_properties":"","match":0.85},{"id":"443388028","segment":"to translate","translation":"traduire","quality":"68","reference":" |@| ","usage-count":1,"subject":" ","created-by":"IATE","last-updated-by":"IATE","create-date":"2014-11-04 01:54:57","last-update-date":"2014-11-04 01:54:57","tm_properties":null,"match":0.74},{"id":"476882062","segment":"To translate:","translation":"A traduire","quality":"74","reference":"","usage-count":1,"subject":"All","created-by":"Matecat","last-updated-by":"Matecat","create-date":"2014-12-04 11:04:23","last-update-date":"2014-12-04 11:04:23","tm_properties":"","match":0.71}]}

But with python, using urllib and exactly the same URL, the website only gives me this output :

can't open file

I wrote a short python example demonstrating my problem:

#!/usr/bin/python3
# coding: utf-8

import urllib.request

# The "Get" function of MyMemory API needs 2 mandatory parameters in the URL :
#  - the text to translate (named 'q'),
#  - and the two languages used to perform the translation,
#    combined together with a pipe sign (|),
#    for example : es|en (named 'langpair').

# An example list of URLs:
URLs = {
    'MyMemory search, with mandatory "langpair" attribute' :
        "http://api.mymemory.translated.net/get?" +
        urllib.parse.urlencode({
            'q' : 'something to translate',
            'langpair' : 'en|fr',
        }),
    # http://api.mymemory.translated.net/get?langpair=en%7Cfr&q=something+to+translate
    # ==> Response data: "can't open file"
    # Didn't work : no JSON data at all, only this error message

    'MyMemory subjects' : "http://api.mymemory.translated.net/subjects",
    # ==> Response data: '["Accounting","Aerospace","Agriculture_and_Farming","Archeol'
    # Ok, it worked

    'Wikipedia' : "http://www.wikipedia.org",
    # http://www.wikipedia.org
    # ==> Response data: '<!DOCTYPE html>\n<html lang="mul" dir="ltr">\n<head>\n<!-- Syso' ...
    # Ok, it worked
}

if __name__ == "__main__":
    # For each URL in the list above:
    for title, url in URLs.items():
        # Display info:
        print("Getting {} :".format(title))
        print('  URL : ' + url)

        # Open the URL:
        data = urllib.request.urlopen(url)

        # Print the beginning of the data received:
        print('  Response data : {}\n'.format(data.read(60)))

Here is the output:

Getting MyMemory search, with mandatory "langpair" attribute :
  URL : http://api.mymemory.translated.net/get?q=something+to+translate&langpair=en%7Cfr
  Response data : b"can't open file"

Getting Wikipedia :
  URL : http://www.wikipedia.org
  Response data : b'<!DOCTYPE html>\n<html lang="mul" dir="ltr">\n<head>\n<!-- Syso'

Getting MyMemory subjects :
  URL : http://api.mymemory.translated.net/subjects
  Response data : b'["Accounting","Aerospace","Agriculture_and_Farming","Archeol'

Where does it go wrong? It seems it doesn't like the 'langpair' param (because of the pipe sign, maybe?), but I don't understand!

Any idea?

using import requests data = requests.get(url) and printing data.content works — Padraic Cunningham
– Padraic Cunningham, Commented Dec 31, 2014 at 2:54
Thank you, yes it works very well! I'm still wondering why my example doesn't work, but this solution is perfect. — yolenoyer
– yolenoyer, Commented Dec 31, 2014 at 3:04
Not sure to be honest but I always use requests it makes life easier more often than not. — Padraic Cunningham
– Padraic Cunningham, Commented Dec 31, 2014 at 3:10

alecxe · Accepted Answer · 2014-12-31 03:15:41Z

3

Just to explain what was the problem. You needed to provide the User-Agent header:

request = urllib.request.Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36')

data = urllib.request.urlopen(request)

Tried, it prints:

Getting MyMemory search, with mandatory "langpair" attribute :
  URL : http://api.mymemory.translated.net/get?langpair=en%7Cfr&q=something+to+translate
  Response data : b'{"responseData":{"translatedText":"quelque chose \\u00e0 trad'

Aside from that, yes, using requests usually saves a lot time and headache, it is for humans.

edited Dec 31, 2014 at 3:15

answered Dec 31, 2014 at 3:10

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Padraic Cunningham Over a year ago

I was just going to try this, saved me some typing ;)

yolenoyer Over a year ago

It works, but i will try the requests lib, it looks very easy. With urllib do you need to put this header each time?

alecxe Over a year ago

@yolenoyer yup, each time.

Collectives™ on Stack Overflow

Same URL request fails with python->urllib but not with curl

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related