0

I am having trouble figuring out how to iterate over variables in a pandas dataframe and perform same arithmetic function on each.

I have a dataframe df that contain three numeric variables x1, x2 and x3. I want to create three new variables by multiplying each by 2. Here’s what I am doing:

existing = ['x1','x2','x3']
new = ['y1','y2','y3']

for i in existing:
    for j in new:
        df[j] = df[i]*2

Above code is in fact creating three new variables y1, y2 and y3 in the dataframe. But the values of y1 and y2 are being overridden by the values of y3 and all three variables have same values, corresponding to that of y3. I am not sure what I am missing.

Really appreciate any guidance/ suggestion. Thanks.

3 Answers 3

2

You are looping something like 9 times here - 3 times for each column, with each iteration overwriting the previous.

You may want something like

for e, n in zip(existing,new):
    df[n] = df[e]*2
Sign up to request clarification or add additional context in comments.

Comments

0

I would do something more generic

#existing = ['x1','x2','x3']
exisiting = df.columns
new = existing.replace('x','y') 
#maybe you need map+lambda/for for each existing string

for (ind_existing, ind_new) in zip(existing,new):
    df[new[ind_new]] = df[existing[ind_existing]]*2 
#maybe there is more elegant way by using pandas assign function

Comments

0

You can concatenante the original DataFrame with the columns with doubled values:

cols_to_double = ['x0', 'x1', 'x2']
new_cols = list(df.columns) + [c.replace('x', 'y') for c in cols_to_double]

df = pd.concat([df, 2 * df[cols_to_double]], axis=1, copy=True)
df.columns = new_cols

So, if your input df Dataframe is:

   x0  x1  x2  other0  other1
0   0   1   2       3       4
1   0   1   2       3       4
2   0   1   2       3       4
3   0   1   2       3       4
4   0   1   2       3       4

after executing the previous lines, you get:

   x0  x1  x2  other0  other1  y0  y1  y2
0   0   1   2       3       4   0   2   4
1   0   1   2       3       4   0   2   4
2   0   1   2       3       4   0   2   4
3   0   1   2       3       4   0   2   4
4   0   1   2       3       4   0   2   4

Here the code to create df:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    data=np.column_stack([np.full((5,), i) for i in range(5)]),
    columns=[f'x{i}' for i in range(3)] + [f'other{i}' for i in range(2)]
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.