0

I have many XML files, and want to create a DF with the content of them. How can I create a DF to store the "Fundo", "CNPJ" and "Quantidade" from the sample bellow?

EDIT:

Here is a link to download a XML file like the ones I want to read:

https://fnet.bmfbovespa.com.br/fnet/publico/downloadDocumento?id=113925

Here is the code I already Tried:

def informes_tri_xml_read():
    Dados = pd.DataFrame([])
    folder = os.listdir(DVD_XML)
    for file in folder:
        #print(file)
        try:
            if file.endswith('.xml'):
                with open(os.path.join(DVD_XML, file), encoding="utf8") as fd:
                    doc = xmltodict.parse(fd.read())
                    if 'Emissor' in doc['DadosEconomicoFinanceiros']['InformeTrimestral']['InfoTipoAtivo']['AtivosFinanceiros']['FII'].keys():
                        NomeFundo = doc['DadosEconomicoFinanceiros']['DadosGerais']['NomeFundo']
                        CNPJFundo = doc['DadosEconomicoFinanceiros']['DadosGerais']['CNPJFundo']
                        Mandato = doc['DadosEconomicoFinanceiros']['DadosGerais']['Autorregulacao']['Mandato']
                        DataTri = doc['DadosEconomicoFinanceiros']['DadosGerais']['DataEncerTrimestre']

                        FiiNome = doc['DadosEconomicoFinanceiros']['InformeTrimestral']['InfoTipoAtivo']['AtivosFinanceiros'][
                            'FII']['Emissor']['Fundo']
                        CNPJ = doc['DadosEconomicoFinanceiros']['InformeTrimestral']['InfoTipoAtivo']['AtivosFinanceiros'][
                            'FII']['Emissor']['CNPJ']
                        Quantidade = doc['DadosEconomicoFinanceiros']['InformeTrimestral']['InfoTipoAtivo']['AtivosFinanceiros'][
                            'FII']['Emissor']['Quantidade']
                    print(CNPJ)

Problem:

It is printing only the first element with "CNPJ", and not the list of "CNPJ" that is bellow the "Emissor" under "FII".

How can i Iterate through all the "Emissor" to print all the different data under it?

4
  • 1
    there is a special package for reading xml files, pypi.org/project/pandas-read-xml Commented Sep 4, 2020 at 12:37
  • 1
    1) upload a valid xml 2) share your current effort with us Commented Sep 4, 2020 at 12:37
  • @balderman i edited the question as you sugested, can you take a look? Commented Sep 4, 2020 at 13:04
  • 1
    Did you look and try to adopt my answer? Commented Sep 4, 2020 at 13:08

1 Answer 1

1

Something like this

import pandas as pd
import xml.etree.ElementTree as ET

xml = '''<r><Emissor>
    <Fundo>REAL ESTATE FUND I</Fundo>
    <CNPJ>11.839.09</CNPJ>
    <Quantidade>118650</Quantidade>
    <Valor>10443573</Valor>
</Emissor>
<Emissor>
    <Fundo>X REAL ESTATE FUND I</Fundo>
    <CNPJ>X 11.839.09</CNPJ>
    <Quantidade>X 118650</Quantidade>
    <Valor>X 10443573</Valor>
</Emissor></r>'''

root = ET.fromstring(xml)
data = []
for em in root.findall('.//Emissor'):
    entry = {}
    for prop in ['Fundo','CNPJ','Quantidade']:
        entry[prop] = em.find(prop).text
    data.append(entry)
df = pd.DataFrame(data)
print(df)

output

                  Fundo         CNPJ Quantidade
0    REAL ESTATE FUND I    11.839.09     118650
1  X REAL ESTATE FUND I  X 11.839.09   X 118650
Sign up to request clarification or add additional context in comments.

2 Comments

How should i pass the xml file instead of the writen xml in to the python?
ET.parse('file.xml')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.