How to build large XML documents in memory with Python using standard library?

Question

I'm trying to create a large XML file in memory that will be inserted into a Blob field in an ESRI feature class.

I attempted to use elementtree, but Python would eventually crash. I probably wasn't doing it the best way. An example of my code (not exact):

with update_cursor on feature class:
    for row in update_cursor:
        root = Element("root") 
        tree = ElementTree(root)
        for id in id_list:
            if row[0] in id:
               equipment = Element("equipment") 
               root.append(equipment)

               attrib1 = Element("attrib1")
               equipment.append(attrib1)
               attrib1.text = "myattrib1"

               attrib2 = Element("attrib2")
               equipment.append(attrib2)
               attrib2.text = "myattrib2"

               ....and about 5 more of these appended to equipment

        xml_data = ET.tostring(root)

        insert xml_data into blob field

Example of the XML:

<root>
  <equipment>
    <attrib1>One</attrib1>
    <attrib2>Two</attrib2>
    <attrib3>Three</attrib3>
    ...
    <attrib10>Ten</attrib10>
  </equipment>
  <equipment>
    <attrib1>One</attrib1>
    <attrib2>Two</attrib2>
    <attrib3>Three</attrib3>
    ...
    <attrib10>Ten</attrib10>
  </equipment>
</root>

Now I realize this is probably a pretty amateur way of doing this, but I'm not sure of the best way to build this XML in memory.

For each row in the update_cursor, there could be multiple "equipment" elements added to the root, and each "equipment" element will have the exact same children elements but with different attributes.

I ran this and there were about 200 ids that matched a single row, so it had to create the equipment element and all the children of the equipment 200 times in memory.

So what is the best way to create XML in memory with Python using a standard library?

It would help us greatly if you describe what the input looks like (i.e. the row and id_list). — Hai Vu
– Hai Vu, Commented Mar 11, 2014 at 16:58
The is working with spatial data and row is just grabbing the unique ID of the point and ID_List is just a list of IDs that match this unique ID. If the ID matches it fills in the XML with the attributes of the ID from the list. Each unique ID can have multiple matches from the ID_List, which represent equipment. — ianbroad
– ianbroad, Commented Mar 12, 2014 at 14:12
I'm just wondering if these is a better way to write the XML then I have here. — ianbroad
– ianbroad, Commented Mar 12, 2014 at 15:08

Ben · Accepted Answer · 2014-03-12 16:06:41Z

2

+25

Your data structure looks dead simple. Do not bother using an XML library. Just write your lines directly into a cStringIO.StringIO.

with update_cursor on feature class:
    for row in update_cursor:
        buffer = cStringIO.StringIO()
        buffer.write("<root>\n")
        for id in id_list:
            if row[0] in id:
               buffer.write("    <equipment>\n")
               buffer.write("        <attrib1>One</attrib1>\n")
               buffer.write("        <attrib2>Two</attrib2>\n")
               buffer.write("        <attrib3>Three</attrib3>\n")

               ....and about 5 more of these appended to equipment

               buffer.write("    </equipment>\n")

        buffer.write("</root>\n")

        xml_data = buffer.getvalue()

        insert xml_data into blob field

answered Mar 12, 2014 at 16:06

Ben

2,4822 gold badges17 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ianbroad Over a year ago

Well I guess what I didn't mention is that I'm not always just creating this data from scratch. Sometimes there will be existing data in the blob and I will have to read it and add new equipment to it and rewrite the xml to the blob. But I didn't know about the cString which is very cool.

Ben Over a year ago

Either way, I think you will need something that is a bit tailored to your use case rather than a more general use XML tool. If you can use a non-bundled package, lxml might do better. If you can write a C extension and use a lighter weight DOM library, that also may be worthwhile. You can also give xml.etree.cElementTree a shot, but I am not sure its memory requirements are that much different. Another idea would be to work on smaller files you can concat at the end.

Bertrand Croq · Accepted Answer · 2014-03-13 15:07:16Z

2

You can use ET.SubElement to create and append elements:

equipment = ET.SubElement(root, "equipment")
ET.SubElement(equipment, "attrib1").text = "One"
ET.SubElement(equipment, "attrib2").text = "Two"
ET.SubElement(equipment, "attrib3").text = "Three"
...

It is shorter and more clear.

answered Mar 13, 2014 at 15:07

Bertrand Croq

213 bronze badges

Collectives™ on Stack Overflow

How to build large XML documents in memory with Python using standard library?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related