Creating a Nested XML document in Python

Question

An occasional scripter, I've scoured this forum and it has taken me so far but I'm stuck so looking for help. I am trying to create an XML document from a CSV structure and the aim is to have something that takes something that looks like this:

ID,Type,Currency,Notional,Underlying,Maturity Date,Representation Type
ID1,COMMIT,EUR,100,,2018-06-01,Bond
ID2,COMMIT,AUD,110,,2018-03-25,Stock

and transforms it to look like this.

<tradeRequests>     
<tradeRequest>
    <id>ID1</id>
    <newDeals size="1">
        <deal>
            <id>ID1</id>
            <terms>
                <id>ID1</id>
                <MaturityDate>2018-06-01</MaturityDate>                 
            </terms>
        </deal>
    </newDeals>     
</tradeRequest>
<tradeRequest>
    <id>ID2</id>
    <newDeals size="1">
        <deal>
            <id>ID2</id>
            <terms>
                <id>ID2</id>
                <MaturityDate>2018-06-01</MaturityDate>                 
            </terms>
        </deal>
    </newDeals>     
    </tradeRequest> 
</tradeRequests>

The problem is my script doesn't seem to be formatting the items in the correct way because every row should essentially be a tradeRequest but I don't see that format.

Here is the snippet of my code, which will extract a subset of columns from a much larger number of columns.

import csv
import xml.etree.ElementTree as ET
import xml.dom.minidom

tradeRequests = ET.Element("tradeRequests")
tradeRequest = ET.SubElement(tradeRequests, "tradeRequest")
newDeals = ET.SubElement(tradeRequest, "newDeals")
deal = ET.SubElement(newDeals, "deal")
dealid = ET.SubElement(deal, "id")

with open('TestCase.csv') as csvfile:
    reader = csv.DictReader(csvfile)

    for row in reader:
        ET.SubElement(tradeRequest, "id").text = row['ID']
        ET.SubElement(tradeRequest, "newDeals", {'size':"1"} )
        ET.SubElement(dealid, "id").text = row['ID']
        ET.SubElement(dealid, "maturityDate").text = row['Maturity Date']
        tree = ET.ElementTree(tradeRequests)
        tree.write("Testcase.xml" )

xml = xml.dom.minidom.parse('Testcase.xml')
pretty_xml_as_string = xml.toprettyxml()

print pretty_xml_as_string

The problem is I can't seem to nest the items properly. I've tried creating a parent/child combination but this hasn't been successful. Instead, based on that code I see an output that looks like this.

<tradeRequests>
    <tradeRequest>
        <newDeals>
            <deal>
                <id>
                    <id>ID1</id>
                    <maturityDate>2018-06-01</maturityDate>
                    <id>ID2</id>
                    <maturityDate>2018-03-25</maturityDate>
                </id>
            </deal>
        </newDeals>
        <id>ID1</id>
        <newDeals size="1"/>
        <id>ID2</id>
        <newDeals size="1"/>
    </tradeRequest>
</tradeRequests>

Any help appreciated as always.

I hadn't anticipated this usercase where I need to loop and create elements dynamically

ID1,COMMIT,EUR,100,,2018-06-01,Bond
ID2,110,2018-03-25,Stock
ID2,110,2018-03-26,A
ID2,110,2018-03-26,B
ID2,110,2018-03-26,C

So in effect I need to create an element that will loop through the ID2 and dynamically create a new element depending on how many rows there are, which is unknown.

so my expected results will be something like

<tradeRequests>
    <ids>
    <id>ID1</id>
            <element>
                <maturityDate>2018-06-01</maturityDate>
                <type>Stock</type
            <element>
        </id>
        <id>ID2</id>
            <element>
                <maturityDate>2018-03-25</maturityDate>
                <type>A</type>
            </element>
            <element>
                <maturityDate>2018-03-25</maturityDate>
                <type>B</type>
            </element>
                <maturityDate>2018-03-25</maturityDate>
                <type>C</type>
            </element>
        </id>
</tradeRequests>

nosklo · Accepted Answer · 2018-06-26 21:36:29Z

1

I strongly suggest using the excellent lxml library. It is really fast, because it is a wrapper based on the C library libxml2, and it includes the element builder object E which makes your job really easy:

import csv
import lxml.etree
from lxml.builder import E

with open('TestCase.csv') as csvfile:
    results = E.tradeRequests(*(
        E.tradeRequest(
            E.id(row['ID']),
            E.newDeals(
                E.deal(
                    E.id(row['ID']),
                    E.terms(
                        E.id(row['ID']),
                        E.MaturityDate(row['Maturity Date']),
                    )
                ),
                size="1",
            )
        ) for row in csv.DictReader(csvfile))
    )

print(lxml.etree.tostring(results, pretty_print=True))

results:

<tradeRequests>
  <tradeRequest>
    <id>ID1</id>
    <newDeals size="1">
      <deal>
        <id>ID1</id>
        <terms>
          <id>ID1</id>
          <MaturityDate>2018-06-01</MaturityDate>
        </terms>
      </deal>
    </newDeals>
  </tradeRequest>
  <tradeRequest>
    <id>ID2</id>
    <newDeals size="1">
      <deal>
        <id>ID2</id>
        <terms>
          <id>ID2</id>
          <MaturityDate>2018-03-25</MaturityDate>
        </terms>
      </deal>
    </newDeals>
  </tradeRequest>
</tradeRequests>

answered Jun 26, 2018 at 21:36

nosklo

224k58 gold badges300 silver badges299 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

emmon simbo Over a year ago

Thanks Nosklo, I'll give this a whirl but looks good. And thanks for the suggestion above. You've saved me a lot of pain.

nosklo Over a year ago

@emmonsimbo if my answer is good, please, mark it as accepted by clicking the green checkmark below the voting buttons

emmon simbo Over a year ago

sorry to come back to you on this and if the protocol is to create a new post, happy to do this. I have a new usercase that needs me to do a loop and I've struggled to integrate this. I've asked it as a new answer.

nosklo Over a year ago

@emmonsimbo it seems you can do the same thing - *(E.element(E.maturityDate(r['Maturity Date']), E.type(r['Type'])) for r in id2_rows) - look into itertools.groupby to group the elements by id, to make it easier. If you still have problems I suggest asking a new question.

emmon simbo Over a year ago

thanks. let me give that a whirl. I have actually been looking at the for logic as I noticed you do that in your original post so I just need to identify the id2_rows. Thanks again

Collectives™ on Stack Overflow

Creating a Nested XML document in Python

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related