0

I have excel file which contains XML data in each cell of column, I want to parse those XML data in each cell and save each to new file. Here is my code:

import pandas as pd
import numpy as np
import xml.etree.cElementTree as et
file_path = r'C:\Users\user\Documents\datasets\sample.xlsx'
df = pd.read_excel(file_path)
for i in count_row:
    pd.read_xml(df['XML'].iloc[i])

Here's sample file and Here's desired output

1 Answer 1

1

Instead of pandas, you could also look at openpyxl. This might make it easier for you to carve out the data that you need. You are mentioning that you want to parse the XML, but not specifying what you want to do with it... but, I would suggest xmltodict library for parsing XML.

Sign up to request clarification or add additional context in comments.

2 Comments

I want output as shown on the image
The XMLs in the cells are a complex structures. E.g. you have MEM_NAME entity in multiple ENQUIRES entities. So there are some things to decide before you can flatten the structure and save as a table. E.g. will they have the column name MEM_NAME1, MEM_NAME2, ...? As Danny mentioned - after extracting with openpyxl - you can convert with xmltodict.parse method if you want to work with Python data structures, or you can also use lxml (or just xml) package to work with the data if you want to be closer to XML structure.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.