How to pass XML content to Pandas in Python

Question

I have many XML files, and want to create a DF with the content of them. How can I create a DF to store the "Fundo", "CNPJ" and "Quantidade" from the sample bellow?

EDIT:

Here is a link to download a XML file like the ones I want to read:

https://fnet.bmfbovespa.com.br/fnet/publico/downloadDocumento?id=113925

Here is the code I already Tried:

def informes_tri_xml_read():
    Dados = pd.DataFrame([])
    folder = os.listdir(DVD_XML)
    for file in folder:
        #print(file)
        try:
            if file.endswith('.xml'):
                with open(os.path.join(DVD_XML, file), encoding="utf8") as fd:
                    doc = xmltodict.parse(fd.read())
                    if 'Emissor' in doc['DadosEconomicoFinanceiros']['InformeTrimestral']['InfoTipoAtivo']['AtivosFinanceiros']['FII'].keys():
                        NomeFundo = doc['DadosEconomicoFinanceiros']['DadosGerais']['NomeFundo']
                        CNPJFundo = doc['DadosEconomicoFinanceiros']['DadosGerais']['CNPJFundo']
                        Mandato = doc['DadosEconomicoFinanceiros']['DadosGerais']['Autorregulacao']['Mandato']
                        DataTri = doc['DadosEconomicoFinanceiros']['DadosGerais']['DataEncerTrimestre']

                        FiiNome = doc['DadosEconomicoFinanceiros']['InformeTrimestral']['InfoTipoAtivo']['AtivosFinanceiros'][
                            'FII']['Emissor']['Fundo']
                        CNPJ = doc['DadosEconomicoFinanceiros']['InformeTrimestral']['InfoTipoAtivo']['AtivosFinanceiros'][
                            'FII']['Emissor']['CNPJ']
                        Quantidade = doc['DadosEconomicoFinanceiros']['InformeTrimestral']['InfoTipoAtivo']['AtivosFinanceiros'][
                            'FII']['Emissor']['Quantidade']
                    print(CNPJ)

Problem:

It is printing only the first element with "CNPJ", and not the list of "CNPJ" that is bellow the "Emissor" under "FII".

How can i Iterate through all the "Emissor" to print all the different data under it?

there is a special package for reading xml files, pypi.org/project/pandas-read-xml — Mohamed Thasin ah
– Mohamed Thasin ah, Commented Sep 4, 2020 at 12:37
@balderman i edited the question as you sugested, can you take a look? — guialmachado
– guialmachado, Commented Sep 4, 2020 at 13:04

balderman · Accepted Answer · 2020-09-04 12:45:58Z

1

Something like this

import pandas as pd
import xml.etree.ElementTree as ET

xml = '''<r><Emissor>
    <Fundo>REAL ESTATE FUND I</Fundo>
    <CNPJ>11.839.09</CNPJ>
    <Quantidade>118650</Quantidade>
    <Valor>10443573</Valor>
</Emissor>
<Emissor>
    <Fundo>X REAL ESTATE FUND I</Fundo>
    <CNPJ>X 11.839.09</CNPJ>
    <Quantidade>X 118650</Quantidade>
    <Valor>X 10443573</Valor>
</Emissor></r>'''

root = ET.fromstring(xml)
data = []
for em in root.findall('.//Emissor'):
    entry = {}
    for prop in ['Fundo','CNPJ','Quantidade']:
        entry[prop] = em.find(prop).text
    data.append(entry)
df = pd.DataFrame(data)
print(df)

output

                  Fundo         CNPJ Quantidade
0    REAL ESTATE FUND I    11.839.09     118650
1  X REAL ESTATE FUND I  X 11.839.09   X 118650

answered Sep 4, 2020 at 12:45

balderman

24k8 gold badges39 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

guialmachado Over a year ago

How should i pass the xml file instead of the writen xml in to the python?

balderman Over a year ago

ET.parse('file.xml')

Collectives™ on Stack Overflow

How to pass XML content to Pandas in Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related