1

I want to retrieve JSON data from the MyMemory web API (using python).

Here is a simple request for this API:

http://api.mymemory.translated.net/get?langpair=en|fr&q=something+to+translate

When I try to get the data with my browser or with curl, it works well, giving me back the JSON object, like this:

{"responseData":{"translatedText":"quelque chose \u00e0 traduire","match":0.85},"responseDetails":"","responseStatus":200,"responderId":"239","matches":[{"id":0,"segment":"something to translate","translation":"quelque chose \u00e0 traduire","quality":70,"reference":"Machine Translation provided by Google, Microsoft, Worldlingo or MyMemory customized engine.","usage-count":1,"subject":false,"created-by":"MT!","last-updated-by":"MT!","create-date":"2014-12-31 02:47:09","last-update-date":"2014-12-31 02:47:09","tm_properties":"","match":0.85},{"id":"443388028","segment":"to translate","translation":"traduire","quality":"68","reference":" |@| ","usage-count":1,"subject":" ","created-by":"IATE","last-updated-by":"IATE","create-date":"2014-11-04 01:54:57","last-update-date":"2014-11-04 01:54:57","tm_properties":null,"match":0.74},{"id":"476882062","segment":"To translate:","translation":"A traduire","quality":"74","reference":"","usage-count":1,"subject":"All","created-by":"Matecat","last-updated-by":"Matecat","create-date":"2014-12-04 11:04:23","last-update-date":"2014-12-04 11:04:23","tm_properties":"","match":0.71}]}

But with python, using urllib and exactly the same URL, the website only gives me this output :

can't open file

I wrote a short python example demonstrating my problem:

#!/usr/bin/python3
# coding: utf-8

import urllib.request

# The "Get" function of MyMemory API needs 2 mandatory parameters in the URL :
#  - the text to translate (named 'q'),
#  - and the two languages used to perform the translation,
#    combined together with a pipe sign (|),
#    for example : es|en (named 'langpair').

# An example list of URLs:
URLs = {
    'MyMemory search, with mandatory "langpair" attribute' :
        "http://api.mymemory.translated.net/get?" +
        urllib.parse.urlencode({
            'q' : 'something to translate',
            'langpair' : 'en|fr',
        }),
    # http://api.mymemory.translated.net/get?langpair=en%7Cfr&q=something+to+translate
    # ==> Response data: "can't open file"
    # Didn't work : no JSON data at all, only this error message

    'MyMemory subjects' : "http://api.mymemory.translated.net/subjects",
    # ==> Response data: '["Accounting","Aerospace","Agriculture_and_Farming","Archeol'
    # Ok, it worked

    'Wikipedia' : "http://www.wikipedia.org",
    # http://www.wikipedia.org
    # ==> Response data: '<!DOCTYPE html>\n<html lang="mul" dir="ltr">\n<head>\n<!-- Syso' ...
    # Ok, it worked
}

if __name__ == "__main__":
    # For each URL in the list above:
    for title, url in URLs.items():
        # Display info:
        print("Getting {} :".format(title))
        print('  URL : ' + url)

        # Open the URL:
        data = urllib.request.urlopen(url)

        # Print the beginning of the data received:
        print('  Response data : {}\n'.format(data.read(60)))

Here is the output:

Getting MyMemory search, with mandatory "langpair" attribute :
  URL : http://api.mymemory.translated.net/get?q=something+to+translate&langpair=en%7Cfr
  Response data : b"can't open file"

Getting Wikipedia :
  URL : http://www.wikipedia.org
  Response data : b'<!DOCTYPE html>\n<html lang="mul" dir="ltr">\n<head>\n<!-- Syso'

Getting MyMemory subjects :
  URL : http://api.mymemory.translated.net/subjects
  Response data : b'["Accounting","Aerospace","Agriculture_and_Farming","Archeol'

Where does it go wrong? It seems it doesn't like the 'langpair' param (because of the pipe sign, maybe?), but I don't understand!

Any idea?

3
  • 2
    using import requests data = requests.get(url) and printing data.content works Commented Dec 31, 2014 at 2:54
  • Thank you, yes it works very well! I'm still wondering why my example doesn't work, but this solution is perfect. Commented Dec 31, 2014 at 3:04
  • Not sure to be honest but I always use requests it makes life easier more often than not. Commented Dec 31, 2014 at 3:10

1 Answer 1

3

Just to explain what was the problem. You needed to provide the User-Agent header:

request = urllib.request.Request(url)
request.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36')

data = urllib.request.urlopen(request)

Tried, it prints:

Getting MyMemory search, with mandatory "langpair" attribute :
  URL : http://api.mymemory.translated.net/get?langpair=en%7Cfr&q=something+to+translate
  Response data : b'{"responseData":{"translatedText":"quelque chose \\u00e0 trad'

Aside from that, yes, using requests usually saves a lot time and headache, it is for humans.

Sign up to request clarification or add additional context in comments.

3 Comments

I was just going to try this, saved me some typing ;)
It works, but i will try the requests lib, it looks very easy. With urllib do you need to put this header each time?
@yolenoyer yup, each time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.