I'm currently ingesting an XML file with lxml and then creating a pandas dataframe from the root element. I'm essentially using this example. I'm doing this so I can do some math / undertake some modelling on the data.
The next step I'd like to achieve is being able to write the data back to the xml document. In other places in my script I've used root.insert since I can force inserting at a particular position index in order to keep the xml document neat and coherent.
Is there a way I can write out each row of the dataframe using something like root.insert(position, data) for each row in the dataframe, where the dataframes column header is the tag?
Example XML
<Root_Data>
<SomeData></SomeData>
<SomeOtherData></SomeOtherData>
<Weather>
<WxId>1</WxId>
<Temp>20></WxId>
<WindSpeed>15</WindSpeed>
</Weather>
# We will insert more weather here - I can find this position index. Assume it is 3.
<SomeMoreData></SomeMoreData>
<Root_Data>
Pandas dataframe:
ID Temp Windspeed
2 25 30
3 30 15
4 15 25
I'd offer some code I've tried so far - but I've actually come up empty handed on how to insert rows from a dataframe to the xml document without manually constructing the XML as strings myself (not great - headers might change, which is why I'd like to use column headers as the tags.).
Expected Result
<Root_Data>
<SomeData></SomeData>
<SomeOtherData></SomeOtherData>
<Weather>
<WxId>1</WxId>
<Temp>20></WxId>
<WindSpeed>15</WindSpeed>
</Weather>
<Weather>
<WxId>2</WxId>
<Temp>25></WxId>
<WindSpeed>30</WindSpeed>
</Weather>
<Weather>
<WxId>3</WxId>
<Temp>30></WxId>
<WindSpeed>15</WindSpeed>
</Weather>
<Weather>
<WxId>4</WxId>
<Temp>15></WxId>
<WindSpeed>25</WindSpeed>
</Weather>
<SomeMoreData></SomeMoreData>
<Root_Data>
Example code so far:
from lxml import etree
import pandas as pd
tree = etree.parse('example.xml')
root = tree.getroot()
#Load into dataframe
for node in root:
res=[]
df_cols = ["WxId","Temp", "WindSpeed"]
res.append(node.attrib.get(df_cols[0]))
for el in df_cols[1:]:
if node is not None and node.find(el) is not None:
res.append(node.find(el).text)
else:
res.append(None)
rows.append({df_cols[i]: res[i]
for i, _ in enumerate(df_cols)})
out_df = pd.DataFrame(rows, columns = df_cols)
out_df = out_df[~out_df['Temp'].isnull()] #Proxy for good / bad data. Remove nulls.
#Now, write from data frame back to root so we can structure the XML before writing to file.
# ? Unknown method