How to parse with xml.etree? Python

Question

Python 3.5

See the code

import urllib.request
from xml.etree import ElementTree as ET

url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'


def conectar(url):
    page = urllib.request.urlopen(url)
    return page.read()

root = ET.fromstring(conectar(url))
s = root.findall("//*[contains(.,'21/')]")

A need extract '21/', but return this error:

Erro:

Traceback (most recent call last):
  File "crawler.py", line 11, in <module>
    root = ET.fromstring(conectar(url))
  File "/home/rg3915/.pyenv/versions/3.5.0/lib/python3.5/xml/etree/ElementTree.py", line 1321, in XML
    parser.feed(text)
xml.etree.ElementTree.ParseError: unbound prefix: line 146, column 8

But I do not know how to solve this error.

Why not using BeautifulSoup?

Nazaf Anwar
– Nazaf Anwar

2015-12-22 14:18:53 +00:00
Commented Dec 22, 2015 at 14:18 — Nazaf Anwar
– Nazaf Anwar, Commented Dec 22, 2015 at 14:18
As it would be in this case?

Regis Santos
– Regis Santos

2015-12-22 14:20:32 +00:00
Commented Dec 22, 2015 at 14:20 — Regis Santos
– Regis Santos, Commented Dec 22, 2015 at 14:20

DavinirJr · Accepted Answer · 2015-12-23 22:50:34Z

1

You could start with:

import urllib2
from bs4 import BeautifulSoup

url = 'http://www.sat.gob.mx/informacion_fiscal/tablas_indicadores/Paginas/tipo_cambio.aspx'
response = urllib2.urlopen(url)
html = response.read()
dom = BeautifulSoup(html, 'html.parser')

tables = dom.find_all("table")
if len(tables):
    table = tables[0]
    print table

(tested in python 2.7)

answered Dec 23, 2015 at 22:50

DavinirJr

261 bronze badge

Sign up to request clarification or add additional context in comments.

Comments

Gary van der Merwe · Accepted Answer · 2015-12-22 15:08:22Z

1

While the document you are trying to parse claims to be xhtml, it is invalid xml due to the unbound prefix.

<gcse:search></gcse:search>

The gcse ns prefix is not defined for the document.

BeautifulSoup would probably be much better suited for what you are trying to do, because it is not fussy about the document being 100% valid.

edited Dec 22, 2015 at 15:08

answered Dec 22, 2015 at 15:01

Gary van der Merwe

9,6524 gold badges58 silver badges83 bronze badges

Collectives™ on Stack Overflow

How to parse with xml.etree? Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related