python pandas - Editing multiple DataFrames with a for loop

Question

Considering the following 2 lists of 3 dicts and 3 empty DataFrames

dict0={'actual': {'2013-02-20 13:30:00': 0.93}}
dict1={'actual': {'2013-02-20 13:30:00': 0.85}}
dict2={'actual': {'2013-02-20 13:30:00': 0.98}}
dicts=[dict0, dict1, dict2]

df0=pd.DataFrame()
df1=pd.DataFrame()
df2=pd.DataFrame()
dfs=[df0, df1, df2]

I want to recursively modify the 3 Dataframes within a loop, by using the following line:

for df, dikt in zip(dfs, dicts):
    df = df.from_dict(dikt, orient='columns', dtype=None)

However, when trying to retrieve for instance 1 of the df outside of the loop, it is still empty

print (df0)

will return

Empty DataFrame
Columns: []
Index: []

When printing the df from within the for loop, we can see the data is correctly appended though.

How to make the loop so that it is possible to print the 3 dfs with their changes outside of the loop?

Blackecho · Accepted Answer · 2016-12-28 22:52:09Z

5

In your loop, df is just a temporary value, not a reference to the corresponding list element. If you want to modify the list while iterating it, you have to reference the list by index. You can do that using Python's enumerate:

for i, (df, dikt) in enumerate(zip(dfs, dicts)):
    dfs[i] = df.from_dict(dikt, orient='columns', dtype=None)

answered Dec 28, 2016 at 22:52

Blackecho

1,2905 gold badges18 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

piRSquared · Accepted Answer · 2016-12-29 00:12:23Z

3

This will get it done in place!!!
Please note the 3 exclamations

one liner

[dfs[i].set_value(r, c, v)
 for i, dn in enumerate(dicts)
 for r, dr in dn.items()
 for c, v in dr.items()];

somewhat more intuitive

for d, df in zip(dicts, dfs):
    temp = pd.DataFrame(d).stack()
    for (r, c), v in temp.iteritems():
        df.set_value(r, c, v)

df0

                     actual
2013-02-20 13:30:00    0.93

equivalent alternative
without the pd.DataFrame construction

for i, dn in enumerate(dicts):
    for r, dr in dn.items():
        for c, v in dr.items():
            dfs[i].set_value(r, c, v)

Why is this different?
All the other answers, so far, reassign a new dataframe to the requisite position in the list of dataframes. They clobber the dataframe that was there. The original dataframe is left empty while a new non-empty one rests in the list.

This solution edits the dataframe in place ensuring the original dataframe is updated with new information.

Per OP:

However, when trying to retrieve for instance 1 of the df outside of the loop, it is still empty

timing
It's also considerably faster

setup

dict0={'actual': {'2013-02-20 13:30:00': 0.93}}
dict1={'actual': {'2013-02-20 13:30:00': 0.85}}
dict2={'actual': {'2013-02-20 13:30:00': 0.98}}
dicts=[dict0, dict1, dict2]

df0=pd.DataFrame()
df1=pd.DataFrame()
df2=pd.DataFrame()
dfs=[df0, df1, df2]

edited Dec 29, 2016 at 0:12

answered Dec 28, 2016 at 23:03

piRSquared

296k68 gold badges509 silver badges654 bronze badges

2 Comments

user4322543 Over a year ago

Your three for-loop solution needlessly deconstructs the existing dicts.

piRSquared Over a year ago

@fuzzyhedge no, it doesn't, I need to get at those keys and values in order to use set_value. Using set_value or pd.DataFrame.at or pd.DataFrame.loc are the only options I could think of to edit dataframe in place. In order to get at those row, column, value combinations, I had to iterate. I could have used a dataframe constructor just to iterate through it, but that was unnecessary.

YOBA · Accepted Answer · 2016-12-28 22:56:23Z

1

You need to keep the reference to the df objects, so you can try:

for idx, dikt in enumerate(dicts):
    dfs[idx] = dfs[idx].from_dict(dikt, orient='columns', dtype=None)

answered Dec 28, 2016 at 22:56

YOBA

2,8071 gold badge16 silver badges30 bronze badges

Comments

bouletta · Accepted Answer · 2016-12-28 22:52:59Z

0

I don't have an explanation for why that is so. However a workaround is:

dict0={'actual': {'2013-02-20 13:30:00': 0.93}}
dict1={'actual': {'2013-02-20 13:30:00': 0.85}}
dict2={'actual': {'2013-02-20 13:30:00': 0.98}}
dicts=[dict0, dict1, dict2]

dfs = []

for dikt in dicts:
    df = df.from_dict(dikt, orient='columns', dtype=None)
    dfs.append(df)

Now

dfs[0]

returns

                     actual
2013-02-20 13:30:00    0.93

answered Dec 28, 2016 at 22:52

bouletta

5259 silver badges20 bronze badges

1 Comment

bouletta Over a year ago

leaving this here but @Blackecho is much better

user4322543 · Accepted Answer · 2016-12-28 23:09:17Z

0

One liner.

>>>df_list = [df.from_dict(dikt, orient='columns', dtype=None) for (df, dikt) in zip(dfs, dicts)]

>>>df_list
[                     actual
2013-02-20 13:30:00    0.93,
                      actual
2013-02-20 13:30:00    0.85, 
                      actual
2013-02-20 13:30:00    0.98]

>>>df_list[0]
                     actual
2013-02-20 13:30:00    0.93

answered Dec 28, 2016 at 23:09

user4322543

Comments

nocibambi · Accepted Answer · 2019-08-28 12:11:51Z

0

You can also do this by putting the dataframes into a dictionary:

dfs = {
    'df0': df0,
    'df1': df1,
    'df2': df2
}

And then calling and assigning the contents of the dictionary in the for loop.

for dfname, dikt in zip(dfs.keys(), dicts):
    dfs[dfname] = dfs[dfname].from_dict(dikt, orient='columns', dtype=None)

This is useful if you can still want to call the dataframes by their name (instead of an arbitrary index in a list...)

dfs['df0']

answered Aug 28, 2019 at 12:11

nocibambi

2,5311 gold badge21 silver badges26 bronze badges

Collectives™ on Stack Overflow

python pandas - Editing multiple DataFrames with a for loop

6 Answers 6

Comments

2 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

2 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related