1
<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
 <S I="50" N="Football">
  <C I="65" N="Russia">
    <L I="167" N="Premier League">
      <E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
        <M K="1x2">
          <B I="81" BTDT="2015-03-23T23:04:00,825">
            <O N="1" V="3"/>
            <O N="X" V="3.1"/>
            <O N="2" V="2.25"/>
        </B>
      </M>
     </E>
    </L>
   </C>
 </S>
</markets>

I am trying to parse this XML using etree in Python. I have done XML parsing before but the documents have always been in the format.

  <tag> value </tag>

I am unsure how to isolate the "D" from "Markets" as well as all the other values.

This is how I open and parse the XML Doc:

z = gzip.open("code2.zip", "r")

tree = etree.parse(z)
print(etree.tostring(tree, pretty_print=True))

I tried:

for markets in tree.findall('markets'):
    print "found"

However this doesn't work. I would appreciate some tips/advice. Hopefully once I get the first "D" extracted I'll be able to get the rest.

4 Answers 4

2

This is a common error when dealing with XML having default namespace. Your XML has default namespace, a namespace declared without prefix, here :

xmlns="http://www.eoddsmaker.net/schemas/markets/1.0"

Threrefore, in your case, all elements are implicitly considered in that namespace. One possible way to query elements in namespace using xpath() :

.......
#creating prefix-to-namespace_uri mapping
ns = {'d' : 'http://www.eoddsmaker.net/schemas/markets/1.0'}

#use registered prefix along with the element name to query, and pass the mapping as 2nd argument
markets = tree.xpath('//d:markets', namespaces=ns)[0]

#get and print value of D attribute from <markets> :
print markets.get('D')
Sign up to request clarification or add additional context in comments.

Comments

0

Try this with xml.etree

import xml.etree.ElementTree as ET
root = ET.fromstring("""<markets xmlns="http://www.eoddsmaker.net/schemas/markets/1.0" D="2015-03-23T23:12:34" CNT="1521">
     <S I="50" N="Football">
      <C I="65" N="Russia">
        <L I="167" N="Premier League">
          <E I="1049367" DT="2015-04-05T15:00:00" ISH="0" BKS="20" T1="Ufa" T2="Terek Groznyi" T1I="79698" T2I="44081">
            <M K="1x2">
              <B I="81" BTDT="2015-03-23T23:04:00,825">
                <O N="1" V="3"/>
                <O N="X" V="3.1"/>
                <O N="2" V="2.25"/>
            </B>
          </M>
         </E>
        </L>
       </C>
     </S>
    </markets>""")

>>>print root.attrib
{'CNT': '1521', 'D': '2015-03-23T23:12:34'}
>>>print root[0].attrib
{'I': '50', 'N': 'Football'}
#and so on to next parse next line

If need of parse from xml file.

import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()

For more refer https://docs.python.org/2/library/xml.etree.elementtree.html

Comments

0

I am answering this question with no knowledge of etree. I simply opened the following page: https://docs.python.org/2/library/xml.etree.elementtree.html#parsing-xml

What you are looking for is attributes, and it is shown how to derive them quite clearly:

tree = etree.parse(z)
root = tree.getroot()
print root.attrib

there are all of your attributes for the <markets> element, like D and CNT.

You should be able to figure out the rest on your own. You simply must loop through the children of each element and grab .attrib from each.

Considering I found this answer so easily, please do a bit more research before posting a question :)

P.S. this answer was written for Python 2.7. For Python 3, it would be print(tree.attrib)

7 Comments

This is giving an error for me and was one of the methods I tried. AttributeError: 'lxml.etree._ElementTree' object has no attribute 'attrib'
My apologies, in the link you'll see, from the tree they get the root element <markets> in your case, using tree.getroot(), and from the root, they call root.attrib
Had just realised that. Thanks for the help. Have the D isolated so the rest should (hopefully) be straightforward.
Goodluck! The answer above mine is a bit more explanatory
Think my optimism was misplaced. Can't get iter working. for element in root.iter("E"): returns None
|
0
print markets.get('D');

To print the 'D' in markets (the root)

for element in tree.iterfind(".//{*}<Tag to search for>"):
   print element.get("<Attribute to look for>");

Will iterate through the elements in the XML file encapsulated by the current node and print the specified attribute of each element in iterfind().

For example:

for element in tree.iterfind(".//{*}O"):
   print element.get("N");

Will print

1
X
2

Also note, if there are multiple namespaces in the XML document you'll have to specify in the curly braces in the string passed to iterfind() to match the namespace you want to search under.

for element in tree.iterfind(".//{http://www.eoddsmaker.net/schemas/markets/1.0}<Tag to search for>"):

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.