4

I'm trying to get the response JSON from the following API endpoint https://datos.madrid.es/egob/catalogo/205026-0-cementerios.json. My code is:

import requests

url = 'https://datos.madrid.es/egob/catalogo/205026-0-cementerios.json'
r = requests.get(url)
r.json()

It fails with the error:

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

If I get the encoding from the request, it's empty. So I've tried to force the encoding before accesing it, with no success:

import requests

url = 'https://datos.madrid.es/egob/catalogo/205026-0-cementerios.json'
r = requests.get(url)
r.encoding = 'utf-8'
r.json()

gives the same error.

r.text

returns something like:

'\x00\x00\x01\x00\x01\x00  \x00\x00\x01\x00\x18\x0 .......

so looks it's not decoding properly the response.

How can I get it successfully decoded?

2 Answers 2

4

The server is doing something funky with user agent header (namely returning the favicon if it's not recognised!). You can work around this by forcing the user agent:

url = 'https://datos.madrid.es/egob/catalogo/205026-0-cementerios.json'
r = requests.get(url, headers={"User-Agent": "curl/7.61.0"})
print(r.json())
Sign up to request clarification or add additional context in comments.

1 Comment

works perfectly thanks! I didn't realize about the favicon stuff!
3

It seems to be zipped. Unzip it and then use json.decode. The encoding is utf-8.

Example:

import zlib
decompressed_data=zlib.decompress(f.read(), 16+zlib.MAX_WBITS)

Your URL is public, you can test it with your favorite browser. Chrome gives following headers:

Cache-Control: no-cache
Connection: Keep-Alive
Content-disposition: inline;filename=205026-0-cementerios.json
Content-Encoding: gzip
Content-Length: 4383
Content-Type: application/json;charset=UTF-8
Date: Thu, 20 Dec 2018 12:19:33 GMT
OT-force-Account-Verify: true
Vary: Accept-Encoding
X-Frame-Options: SAMEORIGIN
X-UA-Compatible: IE=8
Xonnection: close

And after unzipping it looks like good json:

{
"@context": {
    "c": "http://www.w3.org/2002/12/cal#",
    "dcterms": "http://purl.org/dc/terms/",
    "geo": "http://www.w3.org/2003/01/geo/wgs84_pos#",
    "loc": "http://purl.org/ctic/infraestructuras/localizacion#",
    "org": "http://purl.org/ctic/infraestructuras/organizacion#",
    "vcard": "http://www.w3.org/2006/vcard/ns#",
    "title": "vcard:fn",
    "id": "dcterms:identifier",
    "relation": "dcterms:relation",
    "references": "dcterms:references",
    "address": "vcard:adr",
    "area": "loc:barrio",
    "district": "loc:distrito",
    "locality": "vcard:locality",
    "postal-code": "vcard:postal-code",
    "street": "vcard:street-address",
    "location": "vcard:geo",
    "latitude": "geo:lat",
    "longitude": "geo:long",
....

1 Comment

thanks! I marked the other answer as the accepted one only because it requires less code, but yours works OK also, so I've upvoted it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.