1

An occasional scripter, I've scoured this forum and it has taken me so far but I'm stuck so looking for help. I am trying to create an XML document from a CSV structure and the aim is to have something that takes something that looks like this:

ID,Type,Currency,Notional,Underlying,Maturity Date,Representation Type
ID1,COMMIT,EUR,100,,2018-06-01,Bond
ID2,COMMIT,AUD,110,,2018-03-25,Stock

and transforms it to look like this.

<tradeRequests>     
<tradeRequest>
    <id>ID1</id>
    <newDeals size="1">
        <deal>
            <id>ID1</id>
            <terms>
                <id>ID1</id>
                <MaturityDate>2018-06-01</MaturityDate>                 
            </terms>
        </deal>
    </newDeals>     
</tradeRequest>
<tradeRequest>
    <id>ID2</id>
    <newDeals size="1">
        <deal>
            <id>ID2</id>
            <terms>
                <id>ID2</id>
                <MaturityDate>2018-06-01</MaturityDate>                 
            </terms>
        </deal>
    </newDeals>     
    </tradeRequest> 
</tradeRequests>

The problem is my script doesn't seem to be formatting the items in the correct way because every row should essentially be a tradeRequest but I don't see that format.

Here is the snippet of my code, which will extract a subset of columns from a much larger number of columns.

import csv
import xml.etree.ElementTree as ET
import xml.dom.minidom

tradeRequests = ET.Element("tradeRequests")
tradeRequest = ET.SubElement(tradeRequests, "tradeRequest")
newDeals = ET.SubElement(tradeRequest, "newDeals")
deal = ET.SubElement(newDeals, "deal")
dealid = ET.SubElement(deal, "id")

with open('TestCase.csv') as csvfile:
    reader = csv.DictReader(csvfile)

    for row in reader:
        ET.SubElement(tradeRequest, "id").text = row['ID']
        ET.SubElement(tradeRequest, "newDeals", {'size':"1"} )
        ET.SubElement(dealid, "id").text = row['ID']
        ET.SubElement(dealid, "maturityDate").text = row['Maturity Date']
        tree = ET.ElementTree(tradeRequests)
        tree.write("Testcase.xml" )

xml = xml.dom.minidom.parse('Testcase.xml')
pretty_xml_as_string = xml.toprettyxml()

print pretty_xml_as_string

The problem is I can't seem to nest the items properly. I've tried creating a parent/child combination but this hasn't been successful. Instead, based on that code I see an output that looks like this.

<tradeRequests>
    <tradeRequest>
        <newDeals>
            <deal>
                <id>
                    <id>ID1</id>
                    <maturityDate>2018-06-01</maturityDate>
                    <id>ID2</id>
                    <maturityDate>2018-03-25</maturityDate>
                </id>
            </deal>
        </newDeals>
        <id>ID1</id>
        <newDeals size="1"/>
        <id>ID2</id>
        <newDeals size="1"/>
    </tradeRequest>
</tradeRequests>

Any help appreciated as always.

I hadn't anticipated this usercase where I need to loop and create elements dynamically

ID1,COMMIT,EUR,100,,2018-06-01,Bond
ID2,110,2018-03-25,Stock
ID2,110,2018-03-26,A
ID2,110,2018-03-26,B
ID2,110,2018-03-26,C

So in effect I need to create an element that will loop through the ID2 and dynamically create a new element depending on how many rows there are, which is unknown.

so my expected results will be something like

<tradeRequests>
    <ids>
    <id>ID1</id>
            <element>
                <maturityDate>2018-06-01</maturityDate>
                <type>Stock</type
            <element>
        </id>
        <id>ID2</id>
            <element>
                <maturityDate>2018-03-25</maturityDate>
                <type>A</type>
            </element>
            <element>
                <maturityDate>2018-03-25</maturityDate>
                <type>B</type>
            </element>
                <maturityDate>2018-03-25</maturityDate>
                <type>C</type>
            </element>
        </id>
</tradeRequests>

1 Answer 1

1

I strongly suggest using the excellent lxml library. It is really fast, because it is a wrapper based on the C library libxml2, and it includes the element builder object E which makes your job really easy:

import csv
import lxml.etree
from lxml.builder import E

with open('TestCase.csv') as csvfile:
    results = E.tradeRequests(*(
        E.tradeRequest(
            E.id(row['ID']),
            E.newDeals(
                E.deal(
                    E.id(row['ID']),
                    E.terms(
                        E.id(row['ID']),
                        E.MaturityDate(row['Maturity Date']),
                    )
                ),
                size="1",
            )
        ) for row in csv.DictReader(csvfile))
    )

print(lxml.etree.tostring(results, pretty_print=True))

results:

<tradeRequests>
  <tradeRequest>
    <id>ID1</id>
    <newDeals size="1">
      <deal>
        <id>ID1</id>
        <terms>
          <id>ID1</id>
          <MaturityDate>2018-06-01</MaturityDate>
        </terms>
      </deal>
    </newDeals>
  </tradeRequest>
  <tradeRequest>
    <id>ID2</id>
    <newDeals size="1">
      <deal>
        <id>ID2</id>
        <terms>
          <id>ID2</id>
          <MaturityDate>2018-03-25</MaturityDate>
        </terms>
      </deal>
    </newDeals>
  </tradeRequest>
</tradeRequests>
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks Nosklo, I'll give this a whirl but looks good. And thanks for the suggestion above. You've saved me a lot of pain.
@emmonsimbo if my answer is good, please, mark it as accepted by clicking the green checkmark below the voting buttons
sorry to come back to you on this and if the protocol is to create a new post, happy to do this. I have a new usercase that needs me to do a loop and I've struggled to integrate this. I've asked it as a new answer.
@emmonsimbo it seems you can do the same thing - *(E.element(E.maturityDate(r['Maturity Date']), E.type(r['Type'])) for r in id2_rows) - look into itertools.groupby to group the elements by id, to make it easier. If you still have problems I suggest asking a new question.
thanks. let me give that a whirl. I have actually been looking at the for logic as I noticed you do that in your original post so I just need to identify the id2_rows. Thanks again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.