2

would appreciate some assistance or push in the right direction. I have a pandas dataframe, from a txt file, and would like to insert it in an xml doc I'm making. I can set up the xml doc, and convert my dataframe to xml using: How do convert a pandas/dataframe to XML? But I just can't seem to insert the converted dataframe xml into the xml doc made.

So far, my code is:

import pandas as pd
from xml.dom.minidom import Document
from xml.dom.minidom import parseString 


colnamesRBR = ['TIMESTAMP','A']
df = pd.read_table('test_data.txt',sep = ',',header=0,names=colnamesRBR,parse_dates={'datetime':['TIMESTAMP']},index_col='datetime')

doc = Document()
base = doc.createElement('Timeseries')
doc.appendChild(base)

entry = doc.createElement('Series')
base.appendChild(entry)

entry1 = doc.createElement('Header')
entry.appendChild(entry1)

type = doc.createElement('type')
type_content = doc.createTextNode('instantaneous')
type.appendChild(type_content)
entry1.appendChild(type)

timeStepElem = doc.createElement('timeStep')
timeStepElem.setAttribute ('unit','minute')
timeStepElem.setAttribute ('multiplier','5')
entry1.appendChild(timeStepElem)

startDateElem = doc.createElement('startDate')
startDateElem.setAttribute ('time','13:30:00')
startDateElem.setAttribute ('date','2015-06-24')
entry1.appendChild(startDateElem)

eventElem = doc.createElement('event')
eventElem.setAttribute ('time','endDate')
eventElem.setAttribute ('date','2015-06-25')
eventElem.setAttribute ('value','2015-06-25')
entry.appendChild(eventElem)

def to_xml(df, filename=None, mode='w'):
    def row_to_xml(row):
        xml = []
        for i, col_name in enumerate(row.index):
            xml.append('  <event date="{0}" time="{1}" value="{1}"/>'.format(col_name, row.iloc[i]))
        return '\n'.join(xml)
    res = '\n'.join(df.apply(row_to_xml, axis=0))

    if filename is None:
        return res
    with open(filename, mode) as f:
        f.write(res)

series = parseString(to_xml(df)).childNodes[0]
entry.appendChild(series)

pd.DataFrame.to_xml = to_xml
print df.to_xml()

f = open("test.xml","w")
doc.writexml(f, indent = "   ", addindent="   ",newl="\n")
f.close()

The xml saved output file looks good:

<?xml version="1.0" ?>
   <Timeseries>
      <Series>
         <Header>
            <type>instantaneous</type>
            <timeStep multiplier="5" unit="minute"/>
            <startDate date="2015-06-24" time="13:30:00"/>
         </Header>
         <event date="2015-06-25" time="endDate" value="2015-06-25"/>
      </Series>
   </Timeseries>

and the pandas dataframe converted xml is good:

<event date="2015-03-09 15:40:00" time="52.2885" value="52.2885"/>
  <event date="2015-03-09 15:50:00" time="52.3277" value="52.3277"/>
  <event date="2015-03-09 16:00:00" time="52.5045" value="52.5045"/>
  <event date="2015-03-09 16:10:00" time="52.5702" value="52.5702"/>
  <event date="2015-03-09 16:20:00" time="52.5608" value="52.5608"/>

I just can't seem to get the above inserted the xml doc, under the series element, where I have manually done one in the doc. Been trying for a while, and just can't seem get it into the element.attribute function, at this point starting to wonder if I shouldn't just parse the txt directly to xml, but I like the pd option for now.

Just some sample data if it helps:

TIMESTAMP,A
2015/03/09 15:40,52.2885
2015/03/09 15:50,52.3277
2015/03/09 16:00,52.5045
2015/03/09 16:10,52.5702
2015/03/09 16:20,52.5608

The error currently is:

File "<ipython-input-10-906277431901>", line 1, in <module>
    runfile('C:/Users/clinton.chrystal/Documents/Python Scripts/Clint/Text_changes/from_data_to_xml_for SO.py', wdir='C:/Users/clinton.chrystal/Documents/Python Scripts/Clint/Text_changes')

  File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
    execfile(filename, namespace)

  File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "C:/Users/clinton.chrystal/Documents/Python Scripts/Clint/Text_changes/from_data_to_xml_for SO.py", line 60, in <module>
    series = parseString(to_xml(df)).childNodes[0]

  File "C:\Anaconda\lib\xml\dom\minidom.py", line 1928, in parseString
    return expatbuilder.parseString(string)

  File "C:\Anaconda\lib\xml\dom\expatbuilder.py", line 940, in parseString
    return builder.parseString(string)

  File "C:\Anaconda\lib\xml\dom\expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)

ExpatError: junk after document element: line 2, column 2
1

1 Answer 1

1

First of all get rid of the Series tags in your to_xml method:

def to_xml(df, filename=None, mode='w'):
    def row_to_xml(row):
        date = row.TIMESTAMP.split()[0]
        time = row.TIMESTAMP.split()[1]
        value = row.A
        xml = '<event date="{0}" time="{1}" value="{2}"></event>'.format(date, time, value)
        return xml
    res = ' '.join(df.apply(row_to_xml, axis=1))

    if filename is None:
        return res
    with open(filename, mode) as f:
        f.write(res)

Then you can create your XML tree like this:

*from xml.dom.minidom import parseString 

doc = Document()
base = doc.createElement('Timeseries')
doc.appendChild(base)
series = parseString('<Series>' + to_xml(df) + '</Series>').childNodes[0]
base.appendChild(series)

base.appendChild(series)

header = doc.createElement('Header')
series.appendChild(header)

type = doc.createElement('type')
type_content = doc.createTextNode('instantaneous')
type.appendChild(type_content)
header.appendChild(type)

timeStepElem = doc.createElement('timeStep')
timeStepElem.setAttribute ('unit','minute')
timeStepElem.setAttribute ('multiplier','5')
header.appendChild(timeStepElem)

startDateElem = doc.createElement('startDate')
startDateElem.setAttribute ('time','13:30:00')
startDateElem.setAttribute ('date','2015-06-24')
header.appendChild(startDateElem)
print(doc.toprettyxml())*

Output:

<?xml version="1.0" ?>
<Timeseries>
        <Series>
                <event date="2015/03/09" time="15:40" value="52.2885"/>

                <event date="2015/03/09" time="15:50" value="52.3277"/>

                <event date="2015/03/09" time="16:00" value="52.5045"/>

                <event date="2015/03/09" time="16:10" value="52.5702"/>

                <event date="2015/03/09" time="16:20" value="52.5608"/>
                <Header>
                        <type>instantaneous</type>
                        <timeStep multiplier="5" unit="minute"/>
                        <startDate date="2015-06-24" time="13:30:00"/>
                </Header>
        </Series>
</Timeseries>
Sign up to request clarification or add additional context in comments.

7 Comments

thanks for the time to help. I've tried the insert add suggested, and a few variations, but I can't seem to shake the error: ExpatError: junk after document element: line 2, column 2 Read into it, but not luck..still not adding into doc created...anything I could try?
Hm which line is throwing the error? the code works for me with the example data you've provided.
I've edited to Q to show the error output. Maybe I've inserted the 'add in' syntax you provided in the wrong place? Does your output have the 5 lines from the sample data below the manual <event date="2015-06-25" time="endDate" value="2015-06-25"/> in the xml doc?
no you're right, the to_xml function is wrong. notice that the for loop is looping through the columns (which include timestamp) so it's putting the date in the time column. I'll change the function to what i think you want.
also it's not including </event>, i'll put that in as well
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.