0

I have the following XML structure and I am trying to convert it into a csv in python:

<FIXML><Batch>
<PosRpt RptID="34868232064" ReqID="C905EOD20160427" SetSesID="EOD" MtchStat="0" PriSetPx="326.6" SetPx="328.3" SetPxTyp="1" SettlCcy="USD" ReqTyp="1" MsgEvtSrc="REG" BizDt="2016-04-27" SettlDt="2016-07-14" SettlCurrFxRt="1"><Pty ID="CME" R="21"></Pty><Pty ID="905" R="4"></Pty><Pty ID="CBT" R="22"></Pty><Pty ID="905" R="38"><Sub ID="1" Typ="26"/></Pty><Pty ID="905" R="1"></Pty><Instrmt ID="06" Desc="SOYBEAN MEAL FUTURES" CFI="FCAPSO" SecTyp="FUT" Src="H" MMY="201607" MatDt="2016-07-14" Mult="100" Exch="CBT" UOM="tn" UOMQty="100" PxUOM="TON" PxUOMQty="1" ValMeth="FUT" Fctr="1" PxQteCcy="USD" FnlSettlCcy="USD"></Instrmt><Qty Long="2038" Short="1354" Typ="ETR"/><Qty Long="1289" Short="1436" Typ="ALC"/><Qty Long="0" Short="10" Typ="TRF"/><Qty Long="4122" Short="8098" Typ="SOD"/><Qty Long="3957" Short="7406" Typ="FIN"/><Qty Long="937" Short="6325" Typ="IES"/><Qty Long="35" Short="55" Typ="IAS"/><Amt Typ="SMTM" Amt="-675920" Ccy="USD"/><Amt Typ="TVAR" Amt="-325070.33" Ccy="USD"/><Amt Typ="FMTM" Amt="-1000990.33" Ccy="USD"/></PosRpt>
<TrdCaptRpt RptID="21195360680" TrdTyp="0" TrdSubTyp="5" ExecID="85271320160426220810TN0002521" TrdDt="2016-04-27" BizDt="2016-04-27" MLegRptTyp="1" MtchStat="0" MsgEvtSrc="REG" TrdID="106695" LastQty="1" LastPx="323.5" TxnTm="2016-04-27T01:10:25-05:00" SettlCcy="USD" SettlDt="2016-07-14" PxSubTyp="1" VenueTyp="E" VenuTyp="E" OfstInst="0"><Instrmt ID="06" Desc="SOYBEAN MEAL FUTURES" CFI="FCAPSO" SecTyp="FUT" MMY="201607" MatDt="2016-07-14" Mult="100" Exch="CBT" UOM="tn" UOMQty="100" PxUOM="TON" PxUOMQty="1" ValMeth="FUT" Fctr="1" PxQteCcy="USD"></Instrmt><Amt Typ="TVAR" Amt="480" Ccy="USD"/><RptSide Side="1" ClOrdID="25245816" CustCpcty="4" OrdTyp="M" SesID="EOD" SesSub="E" AllocInd="1" AgrsrInd="Y"><Pty ID="CME" R="21"></Pty><Pty ID="905" R="4"></Pty><Pty ID="CBT" R="22"></Pty><Pty ID="905" R="1"></Pty><Pty ID="434GU400" R="24"><Sub ID="1" Typ="26"/></Pty><Pty ID="4QOL" R="12"></Pty><Pty ID="685" R="17"></Pty><Pty ID="4QOL" R="37"></Pty><Pty ID="905" R="38"><Sub ID="1" Typ="26"/></Pty><Pty ID="905" R="7"></Pty><RegTrdID ID="FECC1544943BFEC0302D5F8342" Src="1010000023" Typ="0" Evnt="2"/></RptSide></TrdCaptRpt>
<TrdCaptRpt RptID="21196531008" TrdTyp="0" TrdSubTyp="5" ExecID="88421020160427065733TN0007200" TrdDt="2016-04-27" BizDt="2016-04-27" MLegRptTyp="1" MtchStat="0" MsgEvtSrc="REG" TrdID="115357" LastQty="2" LastPx="325.7" TxnTm="2016-04-27T07:00:12-05:00" SettlCcy="USD" SettlDt="2016-07-14" PxSubTyp="1" VenueTyp="E" VenuTyp="E" OfstInst="0"><Instrmt ID="06" Desc="SOYBEAN MEAL FUTURES" CFI="FCAPSO" SecTyp="FUT" MMY="201607" MatDt="2016-07-14" Mult="100" Exch="CBT" UOM="tn" UOMQty="100" PxUOM="TON" PxUOMQty="1" ValMeth="FUT" Fctr="1" PxQteCcy="USD"></Instrmt><Amt Typ="TVAR" Amt="-520" Ccy="USD"/><RptSide Side="2" ClOrdID="25246712" CustCpcty="4" OrdTyp="M" SesID="EOD" SesSub="E" AllocInd="1" AgrsrInd="Y"><Pty ID="CME" R="21"></Pty><Pty ID="905" R="4"></Pty><Pty ID="CBT" R="22"></Pty><Pty ID="905" R="1"></Pty><Pty ID="434GU400" R="24"><Sub ID="1" Typ="26"/></Pty><Pty ID="4QOL" R="12"></Pty><Pty ID="685" R="17"></Pty><Pty ID="4QOL" R="37"></Pty><Pty ID="905" R="38"><Sub ID="1" Typ="26"/></Pty><Pty ID="905" R="7"></Pty><RegTrdID ID="FECC1544943BFEC0302D64A564" Src="1010000023" Typ="0" Evnt="2"/></RptSide></TrdCaptRpt>
<PosRpt RptID="34868266266" ReqID="C905EOD20160427" SetSesID="EOD" MtchStat="0" PriSetPx="136" SetPx="136" SetPxTyp="1" SettlCcy="USD" ReqTyp="1" MsgEvtSrc="REG" BizDt="2016-04-27" SettlDt="2016-12-28" SettlCurrFxRt="1"><Pty ID="CME" R="21"></Pty><Pty ID="905" R="4"></Pty><Pty ID="CBT" R="22"></Pty><Pty ID="99106105" R="38"><Sub ID="2" Typ="26"/></Pty><Pty ID="905" R="1"></Pty><Instrmt ID="UFU" Desc="UAN FOB NOLA SWAP" CFI="FCACSO" SecTyp="FUT" Src="H" MMY="201612" MatDt="2016-12-28" Mult="100" Exch="CBT" UOM="tn" UOMQty="100" PxUOM="TON" PxUOMQty="1" ValMeth="FUT" Fctr="1" PxQteCcy="USD" FnlSettlCcy="USD"></Instrmt><Qty Long="30" Short="0" Typ="SOD"/><Qty Long="30" Short="0" Typ="FIN"/><Qty Long="30" Short="0" Typ="IES"/><Amt Typ="SMTM" Amt="0" Ccy="USD"/><Amt Typ="TVAR" Amt="0" Ccy="USD"/><Amt Typ="FMTM" Amt="0" Ccy="USD"/><RegTrdID ID="PSC152CEF79387P0203D81FA" Src="1010000023" Typ="0" Evnt="2"/></PosRpt>
<PosRpt RptID="34868372999" ReqID="C905EOD20160427" SetSesID="EOD" MtchStat="0" PriSetPx="675.25" SetPx="669.25" SetPxTyp="1" SettlCcy="USD" ReqTyp="1" MsgEvtSrc="REG" BizDt="2016-04-27" SettlDt="2016-06-30" SettlCurrFxRt="1"><Pty ID="CME" R="21"></Pty><Pty ID="905" R="4"></Pty><Pty ID="CME" R="22"></Pty><Pty ID="98812736" R="38"><Sub ID="2" Typ="26"/></Pty><Pty ID="905" R="1"></Pty><Instrmt ID="CPC" Desc="MALAYSIAN CRUDE PALM OIL CAL S" CFI="FCACSO" SecTyp="FUT" Src="H" MMY="201606" MatDt="2016-06-30" Mult="25" Exch="CME" UOMQty="25" PxUOM="MTONS" PxUOMQty="1" ValMeth="FUT" Fctr="1" PxQteCcy="USD" FnlSettlCcy="USD"></Instrmt><Qty Long="0" Short="200" Typ="SOD"/><Qty Long="0" Short="200" Typ="FIN"/><Qty Long="0" Short="200" Typ="IES"/><Amt Typ="SMTM" Amt="30000" Ccy="USD"/><Amt Typ="TVAR" Amt="0" Ccy="USD"/><Amt Typ="FMTM" Amt="30000" Ccy="USD"/><RegTrdID ID="PSC154373D5298P0302DFC70" Src="1010000023" Typ="0" Evnt="2"/></PosRpt>
<PosRpt RptID="34868373000" ReqID="C905EOD20160427" SetSesID="EOD" MtchStat="0" PriSetPx="665.75" SetPx="661.5" SetPxTyp="1" SettlCcy="USD" ReqTyp="1" MsgEvtSrc="REG" BizDt="2016-04-27" SettlDt="2016-11-30" SettlCurrFxRt="1"><Pty ID="CME" R="21"></Pty><Pty ID="905" R="4"></Pty><Pty ID="CME" R="22"></Pty><Pty ID="98812736" R="38"><Sub ID="2" Typ="26"/></Pty><Pty ID="905" R="1"></Pty><Instrmt ID="CPC" Desc="MALAYSIAN CRUDE PALM OIL CAL S" CFI="FCACSO" SecTyp="FUT" Src="H" MMY="201611" MatDt="2016-11-30" Mult="25" Exch="CME" UOMQty="25" PxUOM="MTONS" PxUOMQty="1" ValMeth="FUT" Fctr="1" PxQteCcy="USD" FnlSettlCcy="USD"></Instrmt><Qty Long="0" Short="400" Typ="SOD"/><Qty Long="0" Short="400" Typ="FIN"/><Qty Long="0" Short="400" Typ="IES"/><Amt Typ="SMTM" Amt="42500" Ccy="USD"/><Amt Typ="TVAR" Amt="0" Ccy="USD"/><Amt Typ="FMTM" Amt="42500" Ccy="USD"/><RegTrdID ID="PSC1540E0A7EA6P0302DFB8E" Src="1010000023" Typ="0" Evnt="2"/></PosRpt>
<TrdCaptRpt RptID="21202575211" TrdTyp="0" TrdDt="2016-04-27" BizDt="2016-04-27" MLegRptTyp="2" MtchStat="1" MsgEvtSrc="REG" TrdID="000991" LastQty="100" LastPx="0.31" TxnTm="2016-04-27T12:33:54-05:00" SettlCcy="USD" SettlDt="2016-08-03" OrigTrdID="15457C3D779LEB0202D1BC6" PxSubTyp="1" VenueTyp="P" VenuTyp="P"><Instrmt ID="DA" Desc="CLASS III MILK OPTIONS" CFI="OCAXPS" SecTyp="OOF" MMY="201607" MatDt="2016-08-03" StrkPx="13.75" Mult="2000" Exch="CME" UOM="lbs" UOMQty="200000" PxUOM="LBS" PxUOMQty="100" PutCall="1" ValMeth="EQTY" Fctr="1" PxQteCcy="USD"></Instrmt><Undly CFI="FCACSO" Desc="CLASS III MILK FUTURES" ID="DA" Src="H" MMY="201607" SecTyp="FUT" Exch="CME"></Undly><Amt Typ="PREM" Amt="62000" Ccy="USD"/><RptSide Side="2" ClOrdID="660" CustCpcty="4" OrdTyp="L" SesID="EOD" SesSub="P" TmBkt="V" AllocInd="1" AgrsrInd="Y"><Pty ID="CME" R="21"></Pty><Pty ID="905" R="4"></Pty><Pty ID="CME" R="22"></Pty><Pty ID="905" R="1"></Pty><Pty ID="77040322" R="24"><Sub ID="1" Typ="26"/></Pty><Pty ID="BKF" R="12"></Pty><Pty ID="826" R="17"></Pty><Pty ID="BLT" R="37"></Pty><Pty ID="905" R="38"><Sub ID="1" Typ="26"/></Pty><Pty ID="905" R="7"></Pty><RegTrdID ID="FECC15457C3D779LEB0202D1BC8" Src="1010000023" Typ="0" Evnt="2"/></RptSide></TrdCaptRpt>
<TrdCaptRpt RptID="21206412158" TrdTyp="0" TrdSubTyp="5" TrdDt="2016-04-27" BizDt="2016-04-27" MLegRptTyp="1" MtchStat="1" MsgEvtSrc="REG" TrdID="124710" LastQty="5" LastPx="0.13" SettlCcy="USD" SettlDt="2016-08-31" PxSubTyp="1" VenueTyp="P" VenuTyp="P" OfstInst="0"><Instrmt ID="DA" Desc="CLASS III MILK OPTIONS" CFI="OPAXPS" SecTyp="OOF" MMY="201608" MatDt="2016-08-31" StrkPx="13" Mult="2000" Exch="CME" UOM="lbs" UOMQty="200000" PxUOM="LBS" PxUOMQty="100" PutCall="0" ValMeth="EQTY" Fctr="1" PxQteCcy="USD"></Instrmt><Undly CFI="FCACSO" Desc="CLASS III MILK FUTURES" ID="DA" Src="H" MMY="201608" SecTyp="FUT" Exch="CME"></Undly><Amt Typ="PREM" Amt="1300" Ccy="USD" SettlDt="2016-04-27"/><RptSide Side="2" ClOrdID="726" CustCpcty="4" OrdTyp="L" SesID="EOD" SesSub="P" TmBkt="M" AllocInd="1" AgrsrInd="Y"><Pty ID="CME" R="21"></Pty><Pty ID="905" R="4"></Pty><Pty ID="CME" R="22"></Pty><Pty ID="905" R="1"></Pty><Pty ID="7704038A" R="24"><Sub ID="1" Typ="26"/></Pty><Pty ID="GRTY" R="12"></Pty><Pty ID="888" R="17"></Pty><Pty ID="GRTY" R="37"></Pty><Pty ID="905" R="38"><Sub ID="1" Typ="26"/></Pty><Pty ID="905" R="7"></Pty><RegTrdID ID="FECC1544943BFEC0302D9028AE" Src="1010000023" Typ="0" Evnt="2"/></RptSide></TrdCaptRpt>
</Batch></FIXML>

And I am trying to convert this into a csv file. I have tried it with the following code but I can;t get the right output:

from xml.etree import ElementTree
import csv

my_list = []

with open('/Users/testuser/Desktop/CMEREG1.XML', 'rt') as f:
    tree = ElementTree.parse(f)

for node in tree.iter('TrdCaptRpt'):
    RptID = node.attrib.get('RptID')
    TrdTyp = node.attrib.get('TrdTyp')
    TrdSubTyp = node.attrib.get('TrdSubTyp')
    TrdDt = node.attrib.get('TrdDt')
    BizDt = node.attrib.get('BizDt')
    MLegRptTyp = node.attrib.get('MLegRptTyp')
    MtchStat = node.attrib.get('MtchStat')
    MsgEvtSrc = node.attrib.get('MsgEvtSrc')
    TrdID = node.attrib.get('TrdID')
    LastQty = node.attrib.get('LastQty')
    LastPx = node.attrib.get('LastPx')
    TxnTm = node.attrib.get('TxnTm')
    SettlCcy = node.attrib.get('SettlCcy')
    SettlDt = node.attrib.get('SettlDt')
    PxSubTyp = node.attrib.get('PxSubTyp')
    VenueTyp = node.attrib.get('VenueTyp')
    VenuTyp = node.attrib.get('VenuTyp')
    OfstInst = node.attrib.get('OfstInst')

    my_list.append[node.attrib.get('RptID')]

    print RptID, TrdTyp, TrdSubTyp, TrdDt, BizDt, MLegRptTyp, MtchStat, MsgEvtSrc, TrdID, LastQty, LastPx, TxnTm, SettlCcy, SettlDt, PxSubTyp, VenueTyp, VenuTyp, OfstInst

with open('/Users/anantsangar/Desktop/output.csv', 'w') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(my_list)

I can't get every tag into the csv. Is there a easy way to export this to a CSV ?

Thank you

4
  • 2
    What is expected output? It seems, you care only about TrdCapRpt elements, do you? Commented May 4, 2016 at 12:43
  • well the format of this piece of xml is a little bit ugly..and yes, since there are several types of elements, despite that data were kept in attributes, do you only care this TrdCaptRpt, sounds like some trade capital report? Commented May 4, 2016 at 12:54
  • @JanVlcinsky yes TrdCapRpt but also PosRpt and Amt. Commented May 4, 2016 at 12:58
  • man, TrdCapRpt / PosRpt / Amt are in different format, column count, level, you could have a better look of your xml by format it first, import xml.dom.minidom; xml_root = xml.dom.minidom.parseString(str_xml); print xml_root.toprettyxml(), wheter you'd put them in one file or seperate one Commented May 4, 2016 at 13:08

2 Answers 2

2

Use csv.DictWriter, get values from node.attrib dictionary

Your elements named TrdCapRpt have attributes, if you have such node, its attribute node.attrib holds a dictionary with key/value for each attribute.

csv.DictWriter allows writing data taken from dictionary.

First some imports (I always use lxml as it is very fast and provides extra features):

from lxml import etree
import csv

Configure file names and fields to use in each record:

xml_fname = "data.xml"
csv_fname = "data.csv"

fields = [
    "RptID", "TrdTyp", "TrdSubTyp", "ExecID", "TrdDt", "BizDt", "MLegRptTyp",
    "MtchStat" "MsgEvtSrc", "TrdID", "LastQty", "LastPx", "TxnTm", "SettlCcy",
    "SettlDt", "PxSubTyp", "VenueTyp", "VenuTyp", "OfstInst"]

Read the XML:

xml = etree.parse(xml_fname)

Iterate over elements "TrdCapRpt", write attribute values to CSV file:

with open(csv_fname, "w") as f:

    writer = csv.DictWriter(f, fields, delimiter=";", extrasaction="ignore")
    writer.writeheader()
    for node in xml.iter("TrdCaptRpt"):
        writer.writerow(node.attrib)

If you prefer using stdlib xml.etree.ElementTree, you shall manage easily as you do now, because the node.attrib is present there too.

Reading from multiple element names

In your comments, you noted, that you want to export attributes from more element names. This is also possible. To do this, I will modify the example to use xpath (which will probably work only with lxml) and add extra column "elm_name" to track, from which element is the record created:

fields = [
    "elm_name",

    "RptID", "TrdTyp", "TrdSubTyp", "ExecID", "TrdDt", "BizDt", "MLegRptTyp",
    "MtchStat" "MsgEvtSrc", "TrdID", "LastQty", "LastPx", "TxnTm", "SettlCcy",
    "SettlDt", "PxSubTyp", "VenueTyp", "VenuTyp", "OfstInst",

    "Typ", "Amt", "Ccy"
]

xml = etree.parse(xml_fname)

with open(csv_fname, "w") as f:

    writer = csv.DictWriter(f, fields, delimiter=";", extrasaction="ignore")
    writer.writeheader()
    for node in xml.xpath("//*[self::TrdCaptRpt or self::PosRpt or self::Amt]"):
        atts = node.attrib
        atts["elm_name"] = node.tag
        writer.writerow(node.attrib)

The modifications are:

  • fields got extra "elm_name" field and fields from other elements (feel free to remove those you are not interested at).
  • iterate over elements using xml.xpath. The XPath expression is more complex so I am not sure, if stdlib ElementTree supports that.
  • before writing the record, I add name of the element into atts dictionary to provide name of the element.

Warning: the element Amt is nested inside PosRpt and this tree structure is not possible to support in CSV. The records are written, but do not hold information about where they come from (apart from following the record for parent element).

Sign up to request clarification or add additional context in comments.

3 Comments

@Tezirg Added solution for reading from multiple element names.
this is so helpful. One last thing how would I get the header in the csv file ? So i want to have all the tag as headers and then the data
@AnantSangar Added one line with writer.writeheader(). The csv.DictWriter is really nicely designed, isn't it?.
0

You should first push each line with all your tags into a list.

for node in tree.iter('TrdCaptRpt'):

    .....

    my_list.push([RptID, TrdTyp, TrdSubTyp, TrdDt, BizDt, 
                  MLegRptTyp, MtchStat, MsgEvtSrc, TrdID, 
                  LastQty, LastPx, TxnTm, SettlCcy, SettlDt, 
                  PxSubTyp, VenueTyp, VenuTyp, OfstInst])

Then write each line to file :

with open('/Users/anantsangar/Desktop/output.csv', 'w') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in my_list:
    spamwriter.writerow(row)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.