0

I am trying to figure out why my code behavior differs from normal execution. I have seen this, but it is not my case:

What to do, if debug behaviour differs from normal execution?

python2.7 using debug behave different then without debug

I'm parsing an XML document to a DataFrame, so I can convert into a csv or excel file. With normal execution, it only parses the last "CPE" of the "LOCALIDADE" node.

This is a chunk of my xml file:

<DISTRITO xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <NOME_DISTRITO>BRAGANCA</NOME_DISTRITO>
  
  <CONCELHO>
    <NOME_CONCELHO>ALFANDEGA DA FE</NOME_CONCELHO>
    <FREGUESIA>
      <NOME_FREGUESIA>AGROBOM</NOME_FREGUESIA>
      <LOCALIDADE>
        <NOME_LOCALIDADE>AGROBOM</NOME_LOCALIDADE>
        <CODIGO_POSTAL>5350</CODIGO_POSTAL>
        <CPE>PT2000022152377DE</CPE>
        <CPE>PT2000022152388XX</CPE>
        <CPE>PT2000022152399XK</CPE>
        <CPE>PT2000022152402BR</CPE>
        <CPE>PT2000022152424NT</CPE>
      </LOCALIDADE>
    </FREGUESIA>

    <FREGUESIA>
      <NOME_FREGUESIA>ALFANDEGA DA FE</NOME_FREGUESIA>
      <LOCALIDADE>
        <NOME_LOCALIDADE>ALFANDEGA DA FE</NOME_LOCALIDADE>
        <CODIGO_POSTAL>5350</CODIGO_POSTAL>
        <CPE>PT2000022153052QF</CPE>
        <CPE>PT2000022153085VV</CPE>
        <CPE>PT2000022153108HV</CPE>
        <CPE>PT2000022153119LM</CPE>
      </LOCALIDADE>
    </FREGUESIA>
  </CONCELHO>
</DISTRITO>

This code works for me when I am debugging it:

import xml.etree.ElementTree as et
import pandas as pd

path = '/Path/toFile.xml'
data = []
for (ev,el) in et.iterparse(path):
        print (el.tag, el.text)        
        if el.tag == 'NOME_DISTRITO': nome = el.text 
        if el.tag == 'NOME_CONCELHO': nc = el.text
        if el.tag == 'NOME_FREGUESIA': nf = el.text
        if el.tag == 'NOME_LOCALIDADE': nl = el.text
        if el.tag == "LOCALIDADE":
            inner = {}
            inner['NOME_DISTRITO'] = nome
            inner['NOME_CONCELHO'] = nc
            inner['NOME_FREGUESIA'] = nf            
            for i in el:                               
                print (i.tag,i.text)
                print(data)
                inner[i.tag] = i.text
                if inner.has_key('CPE'):
                    data.append(inner)   
                                                
df = pd.DataFrame(data)
df.to_csv('/Users/DanielMelo/Documents/Endesa/Portugal/CPE.csv',columns=['CPE','NOME_CONCELHO','NOME_FREGUESIA',
                                     'NOME_LOCALIDADE','CODIGO_POSTAL'])

But this is the result when I run with normal execution:

CPE NOME_CONCELHO   NOME_FREGUESIA  NOME_LOCALIDADE CODIGO_POSTAL
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350

I don't know if it could be a problem when I append the dict into my list, or some kind of conflict when it is trying to convert to csv (which I don't think is the case).

But as I said it works and I have the result that I want when I am debugging, so I can not see what is the problem.

1 Answer 1

2

You are repeatedly adding the same dictionary to the list. Python containers store references, not copies, so any alteration you make to that dictionary is going to be visible through all those references.

Yes, printing that dictionary before you altered it in a next loop iteration won't show the change you make in the next iteration. You are not printing the dictionaries you added, after all, so you don't see those references reflect the change.

Add a copy of the dictionary instead:

if inner.has_key('CPE'):
    data.append(inner.copy())

You can easily reproduce your problem in an interactive session:

>>> data = []
>>> inner = {'foo': 'bar'}
>>> data.append(inner)
>>> data
[{'foo': 'bar'}]
>>> inner['foo'] = 'spam'
>>> inner
{'foo': 'spam'}
>>> data  # note that the data list *also* changed!
[{'foo': 'spam'}]
>>> data = []  # start anew
>>> inner = {'foo': 'bar'}
>>> data.append(inner.copy())  # add a (shallow) copy
>>> data
[{'foo': 'bar'}]
>>> inner['foo'] = 'spam'
>>> data
[{'foo': 'bar'}]
>>> data.append(inner.copy())
>>> data
[{'foo': 'bar'}, {'foo': 'spam'}]
Sign up to request clarification or add additional context in comments.

3 Comments

aside: performance issue: a lot of if could be turned into elif, speed could be a lot better.
So, everytime I want to append a dictionary into a list I need to append a copy? not the same dictionary with different value, but a copy? Thanks for your help! It works :)
@JulianaRivera: if you don't create a copy, all you are doing is adding another reference; all references show the same dictionary data so you'll get the same data in the CSV output, repeated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.