78

I have a DataFrame with a few columns. One columns contains a symbol for which currency is being used, for instance a euro or a dollar sign. Another column contains a budget value. So for instance in one row it could mean a budget of 5000 in euro and in the next row it could say a budget of 2000 in dollar.

In pandas I would like to add an extra column to my DataFrame, normalizing the budgets in euro. So basically, for each row the value in the new column should be the value from the budget column * 1 if the symbol in the currency column is a euro sign, and the value in the new column should be the value of the budget column * 0.78125 if the symbol in the currency column is a dollar sign.

I know how to add a column, fill it with values, copy values from another column etc. but not how to fill the new column conditionally based on the value of another column.

Any suggestions?

6 Answers 6

131

You probably want to do

df['Normalized'] = np.where(df['Currency'] == '$', df['Budget'] * 0.78125, df['Budget'])
Sign up to request clarification or add additional context in comments.

2 Comments

Is it possible to do something like this but with words instead of numbers?
df['Qnty'] = np.where(df['Quantity'].str.extract('([a-z]+)') == 'g', df['Quantity'].str.extract('(\d+)').astype(int) / 1000, df['Quantity'].str.extract('(\d+)').astype(int)) don't know if anyone requires this or not, but still I posted.
25

An option that doesn't require an additional import for numpy:

df['Normalized'] = df['Budget'].where(df['Currency']=='$', df['Budget'] * 0.78125)

Comments

22

Similar results via an alternate style might be to write a function that performs the operation you want on a row, using row['fieldname'] syntax to access individual values/columns, and then perform a DataFrame.apply method upon it

This echoes the answer to the question linked here: pandas create new column based on values from other columns

def normalise_row(row):
    if row['Currency'] == '$'
    ...
    ...
    ...
    return result

df['Normalized'] = df.apply(lambda row : normalise_row(row), axis=1) 

1 Comment

Should that be lambda row:normalise_row(row)? And couldn't you replace the whole thing with just normalise_row?
9

Taking Tom Kimber's suggestion one step further, you could use a Function Dictionary to set various conditions for your functions. This solution is expanding the scope of the question.

I'm using an example from a personal application.

# write the dictionary

def applyCalculateSpend (df_name, cost_method_col, metric_col, rate_col, total_planned_col):
    calculations = {
            'CPMV'  : df_name[metric_col] / 1000 * df_name[rate_col],
            'Free'  : 0
            }
    df_method = df_name[cost_method_col]
    return calculations.get(df_method, "not in dict")

# call the function inside a lambda

test_df['spend'] = test_df.apply(lambda row: applyCalculateSpend(
row,
cost_method_col='cost method',
metric_col='metric',
rate_col='rate',
total_planned_col='total planned'), axis = 1)

  cost method  metric  rate  total planned  spend
0        CPMV    2000   100           1000  200.0
1        CPMV    4000   100           1000  400.0
4        Free       1     2              3    0.0

Comments

4

Panda's loc can also be used without importing numpy:

# First assign Budget to the entire Normalized column
df['Normalized'] = df['Budget']
# Then convert to dollars where Currency equals the dollar sign
df.loc[df['Currency'] == '$', 'Normalized'] = df['Budget'] * 0.78125

Comments

1
df.loc[df['col1'].isnull(), 'col2'] = values

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.