Debug behavior differ from normal execution in python

Question

I am trying to figure out why my code behavior differs from normal execution. I have seen this, but it is not my case:

What to do, if debug behaviour differs from normal execution?

python2.7 using debug behave different then without debug

I'm parsing an XML document to a DataFrame, so I can convert into a csv or excel file. With normal execution, it only parses the last "CPE" of the "LOCALIDADE" node.

This is a chunk of my xml file:

<DISTRITO xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <NOME_DISTRITO>BRAGANCA</NOME_DISTRITO>
  
  <CONCELHO>
    <NOME_CONCELHO>ALFANDEGA DA FE</NOME_CONCELHO>
    <FREGUESIA>
      <NOME_FREGUESIA>AGROBOM</NOME_FREGUESIA>
      <LOCALIDADE>
        <NOME_LOCALIDADE>AGROBOM</NOME_LOCALIDADE>
        <CODIGO_POSTAL>5350</CODIGO_POSTAL>
        <CPE>PT2000022152377DE</CPE>
        <CPE>PT2000022152388XX</CPE>
        <CPE>PT2000022152399XK</CPE>
        <CPE>PT2000022152402BR</CPE>
        <CPE>PT2000022152424NT</CPE>
      </LOCALIDADE>
    </FREGUESIA>

    <FREGUESIA>
      <NOME_FREGUESIA>ALFANDEGA DA FE</NOME_FREGUESIA>
      <LOCALIDADE>
        <NOME_LOCALIDADE>ALFANDEGA DA FE</NOME_LOCALIDADE>
        <CODIGO_POSTAL>5350</CODIGO_POSTAL>
        <CPE>PT2000022153052QF</CPE>
        <CPE>PT2000022153085VV</CPE>
        <CPE>PT2000022153108HV</CPE>
        <CPE>PT2000022153119LM</CPE>
      </LOCALIDADE>
    </FREGUESIA>
  </CONCELHO>
</DISTRITO>

This code works for me when I am debugging it:

import xml.etree.ElementTree as et
import pandas as pd

path = '/Path/toFile.xml'
data = []
for (ev,el) in et.iterparse(path):
        print (el.tag, el.text)        
        if el.tag == 'NOME_DISTRITO': nome = el.text 
        if el.tag == 'NOME_CONCELHO': nc = el.text
        if el.tag == 'NOME_FREGUESIA': nf = el.text
        if el.tag == 'NOME_LOCALIDADE': nl = el.text
        if el.tag == "LOCALIDADE":
            inner = {}
            inner['NOME_DISTRITO'] = nome
            inner['NOME_CONCELHO'] = nc
            inner['NOME_FREGUESIA'] = nf            
            for i in el:                               
                print (i.tag,i.text)
                print(data)
                inner[i.tag] = i.text
                if inner.has_key('CPE'):
                    data.append(inner)   
                                                
df = pd.DataFrame(data)
df.to_csv('/Users/DanielMelo/Documents/Endesa/Portugal/CPE.csv',columns=['CPE','NOME_CONCELHO','NOME_FREGUESIA',
                                     'NOME_LOCALIDADE','CODIGO_POSTAL'])

But this is the result when I run with normal execution:

CPE NOME_CONCELHO   NOME_FREGUESIA  NOME_LOCALIDADE CODIGO_POSTAL
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022152424NT   ALFANDEGA DA FE AGROBOM AGROBOM 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350
PT2000022153119LM   ALFANDEGA DA FE ALFANDEGA DA FE ALFANDEGA DA FE 5350

I don't know if it could be a problem when I append the dict into my list, or some kind of conflict when it is trying to convert to csv (which I don't think is the case).

But as I said it works and I have the result that I want when I am debugging, so I can not see what is the problem.

Martijn Pieters · Accepted Answer · 2016-08-03 12:38:56Z

2

You are repeatedly adding the same dictionary to the list. Python containers store references, not copies, so any alteration you make to that dictionary is going to be visible through all those references.

Yes, printing that dictionary before you altered it in a next loop iteration won't show the change you make in the next iteration. You are not printing the dictionaries you added, after all, so you don't see those references reflect the change.

Add a copy of the dictionary instead:

if inner.has_key('CPE'):
    data.append(inner.copy())

You can easily reproduce your problem in an interactive session:

>>> data = []
>>> inner = {'foo': 'bar'}
>>> data.append(inner)
>>> data
[{'foo': 'bar'}]
>>> inner['foo'] = 'spam'
>>> inner
{'foo': 'spam'}
>>> data  # note that the data list *also* changed!
[{'foo': 'spam'}]
>>> data = []  # start anew
>>> inner = {'foo': 'bar'}
>>> data.append(inner.copy())  # add a (shallow) copy
>>> data
[{'foo': 'bar'}]
>>> inner['foo'] = 'spam'
>>> data
[{'foo': 'bar'}]
>>> data.append(inner.copy())
>>> data
[{'foo': 'bar'}, {'foo': 'spam'}]

edited Aug 3, 2016 at 12:38

answered Aug 3, 2016 at 12:26

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jean-François Fabre Over a year ago

aside: performance issue: a lot of if could be turned into elif, speed could be a lot better.

Juliana Rivera Over a year ago

So, everytime I want to append a dictionary into a list I need to append a copy? not the same dictionary with different value, but a copy? Thanks for your help! It works :)

Martijn Pieters Over a year ago

@JulianaRivera: if you don't create a copy, all you are doing is adding another reference; all references show the same dictionary data so you'll get the same data in the CSV output, repeated.

Collectives™ on Stack Overflow

Debug behavior differ from normal execution in python

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related