0
import urllib
from urllib.request import urlopen
import xml.etree.ElementTree as etree
response = urllib.request.urlopen("http://regnskaber.virk.dk/32673592/eGJybHN0b3JlOi8vWC1GNzY5MUY0Ny0yMDE0MDMyOV8xMzQxNThfMTc5L3hicmw.xml")

print (response.getcode())

print (response.readline()) # it gets the first line if you need to the check the output

Please help on how to fix this encoding problem.I need to parse XML content.

4
  • @haspander-it's not a bulit in one.I have some restrictions to install or use those libraries. Commented Jul 2, 2018 at 13:23
  • @ 9769953 -the output is : b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x00\xed=\xd9r\x1b9\x92\xef\x1d\xd1\xffP\xeb\x87\x8d\x99\x08\x89\xe2}x=\x8a\x95,\xb9\xc7\xdb\xb6\xe5\xb04\x9e\xddG\x90\x05\x920\x8bU\x1c\x00\xa4\xc5\x0f\xd8Oi\x7f\x83\xdf\xf9c\x9b\x99@\xdd\x07\x8b\x94\xdcR\xefL\x84\xc3\x92X@"\xef\x0b@\xf1\xd5\xfdXz\xe2%\xfe\xef\xdc/=_\xfd\xe5\xc5\\\xeb\xd5\xcb\xb3\xb3\xaf_\xbf6\xf0\xe3F gg\xedf\xb3s&|\xa5\x99?\xe1/\xcc\xc8\x97\xd3h,\x8ds\'\x13\xd6p\x17g*\x18\x87#\xc6\xc5#\xb8\xaf\xe5\xf6\x92y\x08\xecv\xce\xb9\xbe\x98L\x82\xb5\xaf\xdf\x04r\xf9\xd6\x9f\x04K Commented Jul 2, 2018 at 13:33
  • it is not the right one,i need the XML to parse Commented Jul 2, 2018 at 13:33
  • 1
    Add extra / new information into your question, not as a comment (comments should not be necessary). A traceback would also have been useful. But see my other comment, and my answer. Commented Jul 2, 2018 at 13:41

1 Answer 1

4

The magic bytes 0x1f8b at the start of the response indicate zlib compression. Servers will often compress the data for transport, and browsers automatically uncompress them. Here, you'll have to do the second step yourself:

import urllib
from urllib.request import urlopen
import xml.etree.ElementTree as ET
from io import BytesIO
import gzip
response = urllib.request.urlopen("http://regnskaber.virk.dk/32673592/eGJybHN0b3JlOi8vWC1GNzY5MUY0Ny0yMDE0MDMyOV8xMzQxNThfMT\
c5L3hicmw.xml")
print (response.getcode())

data = response.read()

compdata = BytesIO(data)
text = []
for unit in gzip.GzipFile(fileobj=compdata):
    text.append(unit)
text = b"".join(text)

tree = ET.fromstring(text)
print(tree)

Output:

200
<Element '{http://www.xbrl.org/2003/instance}xbrl' at 0x104d09098>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.