4

I'm trying to parse XML document in Python, so that I can do manipulations on the data and write out a new file. The full file that I'm working with is here, but here is an excerpt:

<?xml version="1.0" encoding="UTF-8"?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
    <ERRORCODE>0</ERRORCODE>
    <PRODUCT BUILD="09-11-2013" NAME="FileMaker" VERSION="ProAdvanced 12.0v5"/>
    <DATABASE DATEFORMAT="M/d/yyyy" LAYOUT="" NAME="All gigs 88-07.fmp12" RECORDS="746" TIMEFORMAT="h:mm:ss a"/>
    <METADATA>
        <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="Country" TYPE="TEXT"/>
        <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="Year" TYPE="TEXT"/>
        <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="City" TYPE="TEXT"/>
        <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="State" TYPE="TEXT"/>
        <FIELD EMPTYOK="YES" MAXREPEAT="1" NAME="Theater" TYPE="TEXT"/>
    </METADATA>
    <RESULTSET FOUND="746">
        <ROW MODID="3" RECORDID="32">
            <COL>
                <DATA/>
            </COL>
            <COL>
                <DATA>1996</DATA>
            </COL>
            <COL>
                <DATA>Pompano Beach</DATA>
            </COL>
            <COL>
                <DATA>FL</DATA>
            </COL>
            <COL>
                <DATA>First Presbyterian Church</DATA>
            </COL>
        </ROW>
        <ROW MODID="3" RECORDID="33">
            <COL>
                <DATA/>
            </COL>
            <COL>
                <DATA>1996</DATA>
            </COL>
            <COL>
                <DATA>Hilton Head</DATA>
            </COL>
            <COL>
                <DATA>SC</DATA>
            </COL>
            <COL>
                <DATA>Self Family Arts Center</DATA>
            </COL>
        </ROW>
        <!-- snip many more ROW elements -->
    </RESULTSET>
</FMPXMLRESULT>

Eventually, I want to use the information from the METADATA field to parse the columns in the RESULTSET, but for now I’m having trouble just getting a handle on the data. Here is what I’ve tried to get the contents of the METADATA element:

import xml.etree.ElementTree as ET

tree = ET.parse('giglist.xml')
root = tree.getroot()
print root
metadata = tree.find("METADATA")
print metadata

This prints out:

<Element '{http://www.filemaker.com/fmpxmlresult}FMPXMLRESULT' at 0x10f982cd0>
None

Why is metadata empty? Am I misusing the find() method?

3
  • 1
    You cannot 'print root' before you assign root; typo? Commented Oct 4, 2014 at 19:38
  • @OlofBjarnason typo. (in the post, not in the code) Commented Oct 4, 2014 at 19:39
  • FWIW, I find xmltodict more convenient than elementtree; it makes xml feel more like json. Commented Oct 4, 2014 at 20:28

1 Answer 1

6

You need to handle namespaces.

But, since there is only a default namespace given, you can find the element by using the following syntax:

import xml.etree.ElementTree as ET

ns = 'http://www.filemaker.com/fmpxmlresult'

tree = ET.parse('giglist.xml')
root = tree.getroot()

metadata = root.find("{%s}METADATA" % ns)
print metadata  # prints <Element '{http://www.filemaker.com/fmpxmlresult}METADATA' at 0x103ccbe90>

Here are the relevant threads you may want to see:


UPD (getting the list of results):

import xml.etree.ElementTree as ET

ns = 'http://www.filemaker.com/fmpxmlresult'

tree = ET.parse('giglist.xml')
root = tree.getroot()

keys = [field.attrib['NAME'] for field in root.findall(".//{%(ns)s}METADATA/{%(ns)s}FIELD" % {'ns': ns})]
results = [dict(zip(keys, [col.text for col in row.findall(".//{%(ns)s}COL/{%(ns)s}DATA" % {'ns': ns})]))
           for row in root.findall(".//{%(ns)s}RESULTSET/{%(ns)s}ROW" % {'ns': ns})]

print results

Prints:

[
    {'City': 'Pompano Beach', 'Country': None, 'State': 'FL', 'Theater': 'First Presbyterian Church', 'Year': '1996'}, 
    {'City': 'Hilton Head', 'Country': None, 'State': 'SC', 'Theater': 'Self Family Arts Center', 'Year': '1996'}
]
Sign up to request clarification or add additional context in comments.

9 Comments

Ooh, I had always assumed that namespace was just XML noise and clutter. I’ll give it a try; thanks.
this seemingly also returns the <METADATA> opening tag. Is there any way to avoid this, besides just filtering it out when I use the results?
@ZevEisenberg yup, the code I've posted returns the metadata tag. What data do you want to get from the xml file? What is your desired output? Thanks.
I'm trying to get the ROW elements into an array of dictionaries/tuples/objects that I can work with in Python. Was assuming I would use METADATA to figure out what each element inside a ROW is.
@ZevEisenberg thank you for the information - I would take a look later today.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.