Reading element values from XML using Python LXML

Question

<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
 <S I="50" N="Football">
  <C I="65" N="Russia">
    <L I="167" N="Premier League">
      <E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
        <M K="1x2">
          <B I="81" BTDT="2015-03-23T23:04:00,825">
            <O N="1" V="3"/>
            <O N="X" V="3.1"/>
            <O N="2" V="2.25"/>
        </B>
      </M>
     </E>
    </L>
   </C>
 </S>
</markets>

I am trying to parse this XML using etree in Python. I have done XML parsing before but the documents have always been in the format.

  <tag> value </tag>

I am unsure how to isolate the "D" from "Markets" as well as all the other values.

This is how I open and parse the XML Doc:

z = gzip.open("code2.zip", "r")

tree = etree.parse(z)
print(etree.tostring(tree, pretty_print=True))

I tried:

for markets in tree.findall('markets'):
    print "found"

However this doesn't work. I would appreciate some tips/advice. Hopefully once I get the first "D" extracted I'll be able to get the rest.

har07 · Accepted Answer · 2015-03-24 01:06:17Z

2

This is a common error when dealing with XML having default namespace. Your XML has default namespace, a namespace declared without prefix, here :

xmlns="http://www.eoddsmaker.net/schemas/markets/1.0"

Threrefore, in your case, all elements are implicitly considered in that namespace. One possible way to query elements in namespace using xpath() :

.......
#creating prefix-to-namespace_uri mapping
ns = {'d' : 'http://www.eoddsmaker.net/schemas/markets/1.0'}

#use registered prefix along with the element name to query, and pass the mapping as 2nd argument
markets = tree.xpath('//d:markets', namespaces=ns)[0]

#get and print value of D attribute from <markets> :
print markets.get('D')

edited Mar 24, 2015 at 1:06

answered Mar 24, 2015 at 0:54

har07

89.5k12 gold badges87 silver badges143 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

itzMEonTV · Accepted Answer · 2015-03-23 23:32:45Z

Try this with xml.etree

import xml.etree.ElementTree as ET
root = ET.fromstring("""<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
     <S I="50" N="Football">
      <C I="65" N="Russia">
        <L I="167" N="Premier League">
          <E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
            <M K="1x2">
              <B I="81" BTDT="2015-03-23T23:04:00,825">
                <O N="1" V="3"/>
                <O N="X" V="3.1"/>
                <O N="2" V="2.25"/>
            </B>
          </M>
         </E>
        </L>
       </C>
     </S>
    </markets>""")

>>>print root.attrib
{'CNT': '1521', 'D': '2015-03-23T23:12:34'}
>>>print root[0].attrib
{'I': '50', 'N': 'Football'}
#and so on to next parse next line

If need of parse from xml file.

import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()

For more refer https://docs.python.org/2/library/xml.etree.elementtree.html

tenwest · Accepted Answer · 2015-03-23 23:35:01Z

0

I am answering this question with no knowledge of etree. I simply opened the following page: https://docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml

What you are looking for is attributes, and it is shown how to derive them quite clearly:

tree = etree.parse(z)
root = tree.getroot()
print root.attrib

there are all of your attributes for the <markets> element, like D and CNT.

You should be able to figure out the rest on your own. You simply must loop through the children of each element and grab .attrib from each.

Considering I found this answer so easily, please do a bit more research before posting a question :)

P.S. this answer was written for Python 2.7. For Python 3, it would be print(tree.attrib)

edited Mar 23, 2015 at 23:35

answered Mar 23, 2015 at 23:26

tenwest

2,3572 gold badges15 silver badges17 bronze badges

7 Comments

user1235720 Over a year ago

This is giving an error for me and was one of the methods I tried. AttributeError: 'lxml.etree._ElementTree' object has no attribute 'attrib'

tenwest Over a year ago

My apologies, in the link you'll see, from the tree they get the root element <markets> in your case, using tree.getroot(), and from the root, they call root.attrib

user1235720 Over a year ago

Had just realised that. Thanks for the help. Have the D isolated so the rest should (hopefully) be straightforward.

tenwest Over a year ago

Goodluck! The answer above mine is a bit more explanatory

user1235720 Over a year ago

Think my optimism was misplaced. Can't get iter working. for element in root.iter("E"): returns None

|

derf · Accepted Answer · 2015-07-09 21:52:37Z

print markets.get('D');

To print the 'D' in markets (the root)

for element in tree.iterfind(".//{*}<Tag to search for>"):
   print element.get("<Attribute to look for>");

Will iterate through the elements in the XML file encapsulated by the current node and print the specified attribute of each element in iterfind().

For example:

for element in tree.iterfind(".//{*}O"):
   print element.get("N");

Will print

1
X
2

Also note, if there are multiple namespaces in the XML document you'll have to specify in the curly braces in the string passed to iterfind() to match the namespace you want to search under.

for element in tree.iterfind(".//{http://www.eoddsmaker.net/schemas/markets/1.0}<Tag to search for>"):

Collectives™ on Stack Overflow

Reading element values from XML using Python LXML

4 Answers 4

Comments

Comments

7 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related