1

I am working with a dataframe within a loop. Within each iteration, there are operations performed on the dataframe variables. At the end of each iteration, I need to store this dataframe into a dictionary, with the index that is related to the iteration index.

For example:

df = pd.DataFrame(index=range(20))
dict = {}
for k in range(5):
    df['iter'] = k
    dict[k] = df

My expected result of 'dict' would be a dictionary with 5 dataframes. Say for key value '1', I should have a dataframe 'df' with a column 'iter' that has all values as 1. Similarly, for key value '2', I should have a 'df' with all values 2.

However, I find that all the key values have the same dataframe stored in them. All of them have the value 4 in the dataframe.

I tried running the operations step-by-step, instead of looping. What I found is that, initially the correct dataframe is stored. But in the next iteration step, when performing

df['iter'] = k

the value within the dictionary is also getting updated.

What is the way to get around this problem? My actual dataframe is much bigger and have many more operations, that need to be performed within the loop.

2 Answers 2

2

Each entry into dict (terrible name, BTW, as it is already the name of the type) needs to be a copy of df.

Sign up to request clarification or add additional context in comments.

Comments

1

You need to do a copy of the data frame. (dict is a terrible name, don't use keywords as variable names. If you do need to use them, follow them by an underscore.)

df = pd.DataFrame(index=range(20))
dict_ = {}
for k in range(5):
    df['iter'] = k
    dict_[k] = df.copy()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.