How to Parse XML saved in Excel cell

Question

I have excel file which contains XML data in each cell of column, I want to parse those XML data in each cell and save each to new file. Here is my code:

import pandas as pd
import numpy as np
import xml.etree.cElementTree as et
file_path = r'C:\Users\user\Documents\datasets\sample.xlsx'
df = pd.read_excel(file_path)
for i in count_row:
    pd.read_xml(df['XML'].iloc[i])

Here's sample file and Here's desired output

Danny Meijer · Accepted Answer · 2022-03-02 08:55:15Z

1

Instead of pandas, you could also look at openpyxl. This might make it easier for you to carve out the data that you need. You are mentioning that you want to parse the XML, but not specifying what you want to do with it... but, I would suggest xmltodict library for parsing XML.

answered Mar 2, 2022 at 8:55

Danny Meijer

3074 silver badges3 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

tony michael Over a year ago

I want output as shown on the image

Arpad Horvath -- Слава Україні Over a year ago

The XMLs in the cells are a complex structures. E.g. you have MEM_NAME entity in multiple ENQUIRES entities. So there are some things to decide before you can flatten the structure and save as a table. E.g. will they have the column name MEM_NAME1, MEM_NAME2, ...? As Danny mentioned - after extracting with openpyxl - you can convert with xmltodict.parse method if you want to work with Python data structures, or you can also use lxml (or just xml) package to work with the data if you want to be closer to XML structure.

Collectives™ on Stack Overflow

How to Parse XML saved in Excel cell

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related