Python / Pandas / XML - Write pandas dataframe rows back to LXML

Question

I'm currently ingesting an XML file with lxml and then creating a pandas dataframe from the root element. I'm essentially using this example. I'm doing this so I can do some math / undertake some modelling on the data.

The next step I'd like to achieve is being able to write the data back to the xml document. In other places in my script I've used root.insert since I can force inserting at a particular position index in order to keep the xml document neat and coherent.

Is there a way I can write out each row of the dataframe using something like root.insert(position, data) for each row in the dataframe, where the dataframes column header is the tag?

Example XML

<Root_Data>

  <SomeData></SomeData>
  <SomeOtherData></SomeOtherData>   
   
  <Weather>
    <WxId>1</WxId>
    <Temp>20></WxId>
    <WindSpeed>15</WindSpeed>
  </Weather>

  # We will insert more weather here - I can find this position index. Assume it is 3.

  <SomeMoreData></SomeMoreData>
<Root_Data>

Pandas dataframe:

ID Temp Windspeed
2  25   30
3  30   15
4  15   25

I'd offer some code I've tried so far - but I've actually come up empty handed on how to insert rows from a dataframe to the xml document without manually constructing the XML as strings myself (not great - headers might change, which is why I'd like to use column headers as the tags.).

Expected Result

<Root_Data>

  <SomeData></SomeData>
  <SomeOtherData></SomeOtherData>   
   
  <Weather>
    <WxId>1</WxId>
    <Temp>20></WxId>
    <WindSpeed>15</WindSpeed>
  </Weather>
  <Weather>
    <WxId>2</WxId>
    <Temp>25></WxId>
    <WindSpeed>30</WindSpeed>
  </Weather>
  <Weather>
    <WxId>3</WxId>
    <Temp>30></WxId>
    <WindSpeed>15</WindSpeed>
  </Weather>
  <Weather>
    <WxId>4</WxId>
    <Temp>15></WxId>
    <WindSpeed>25</WindSpeed>
  </Weather>

  <SomeMoreData></SomeMoreData>
<Root_Data>

Example code so far:

from lxml import etree
import pandas as pd

tree = etree.parse('example.xml')
root = tree.getroot()

#Load into dataframe
for node in root:
            res=[]
            df_cols = ["WxId","Temp", "WindSpeed"]
            res.append(node.attrib.get(df_cols[0]))
            for el in df_cols[1:]:
                if node is not None and node.find(el) is not None:
                    res.append(node.find(el).text)
                else:
                    res.append(None)
            rows.append({df_cols[i]: res[i]
                        for i, _ in enumerate(df_cols)})
        out_df = pd.DataFrame(rows, columns = df_cols)
        out_df = out_df[~out_df['Temp'].isnull()] #Proxy for good / bad data. Remove nulls.

#Now, write from data frame back to root so we can structure the XML before writing to file. 
# ? Unknown method

Pratik V · Accepted Answer · 2021-08-07 06:10:58Z

1

Another approach, In case your Columns are undefined or may increase in the future.

df = pd.read_csv('./123.csv')

root = etree.Element("root")
for rows in range(0,df.shape[0]):
    Tag = etree.Element('weather')
    for cols in range(0,df.shape[1]):
        etree.SubElement(Tag,df.iloc[rows:,cols].head().name).text = str(df.iloc[rows][cols])
    # Append Element "Tag" to the Main Root here
    root.append(Tag)

print(etree.tostring(root,encoding='Unicode'))

answered Aug 7, 2021 at 6:10

Pratik V

7010 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

anakaine Over a year ago

Can I ask - your example works, but results in all elements being added to a single row without formatting. Is it possible to format the element being written so there are line breaks and indents like in the example. I assume it pertains to this portion of the code: etree.SubElement(Tag,df.iloc[rows:,cols].head().name).text

Pratik V Over a year ago

So, when you export that file to XML, use pretty_print for indentation. for STDOUT print, ex. print(etree.tostring(root,encoding='Unicode',pretty_print=True))

anakaine Over a year ago

Yeah, doesn't seem to fix it, unfortunately.

outfile = 'test.xml'         tree.write(outfile, xml_declaration=True, standalone='yes', encoding='utf-8', pretty_print=True)

Pratik V Over a year ago

If you are modifying the existing structure a lot - check this stackoverflow.com/questions/7903759/… It requires "remove_blank_text=True"

anakaine Over a year ago

Thanks. I was looking at another similar post but wasn't getting anywhere. I have wound up writing the file with messy tags, then importing it again and using the parser, so tree --> file --> import file again --> parser --> tree --> write tree.. I could not seem to figure out how to go from tree --> parser --> write tree without going via a file first.

Corralien · Accepted Answer · 2021-08-07 05:48:42Z

1

You can use to_xml to convert your dataframe to xml:

xdata = df.rename(columns={'ID': 'WxId'})
          .to_xml(index=False, root_name='Root_Data', row_name='Weather')

>>> xdata
<?xml version='1.0' encoding='utf-8'?>
<Root_Data>
  <Weather>
    <WxId>2</WxId>
    <Temp>25</Temp>
    <Windspeed>30</Windspeed>
  </Weather>
  <Weather>
    <WxId>3</WxId>
    <Temp>30</Temp>
    <Windspeed>15</Windspeed>
  </Weather>
  <Weather>
    <WxId>4</WxId>
    <Temp>15</Temp>
    <Windspeed>25</Windspeed>
  </Weather>
</Root_Data>

Now you can use lxml to insert data before the first child Weather and the last child Weather or insert your xdata somewhere in your orginal xml file.

FYI, you can use pd.read_xml to convert your xml to a dataframe.

answered Aug 7, 2021 at 5:48

Corralien

121k8 gold badges44 silver badges69 bronze badges

1 Comment

anakaine Over a year ago

So I'm trying the following two lines and getting an error. xdata = out_df.to_xml(index=False, root_name='Root_Data', row_name='Weather') root.insert(insertPosition, xdata) Error: TypeError: Argument 'element' has incorrect type (expected lxml.etree._Element, got str). Any ideas?

Collectives™ on Stack Overflow

Python / Pandas / XML - Write pandas dataframe rows back to LXML

2 Answers 2

5 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related