2

I have a dataFrame with columns Age, Salary and others, if I used:

df['Age'] = df['Age'].apply(lambda x : x+100 if x>30 else 0)

Then I can modify the Age column with the if else condition. Also, if I used:

df[['Age', 'Salary']] = df[['Age', 'Salary']].apply(lambda x : x+100)

Then, I can apply the lambda equation to each column independently. But as soon as I use an if else condition on both columns as:

df[['Age', 'Salary']] = df[['Age', 'Salary']].apply(lambda x : x+100 if x>30 else 0)

Then I get the following error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So, how can I modify the Age, Salary and n columns applying the same if else or other lambda condition to each column independently?

I know two possible solution are:

  1. To use a for to call each column:

cols = ['Age', 'Salary'] for i in cols: df[i] = df[i].apply(lambda x : x+100 if x>30 else 0)

  1. To use the apply on each column:

df['Age'] = df['Age'].apply(lambda x : x+100 if x>30 else 0) df['Salary'] = df['Salary'].apply(lambda x : x+100 if x>30 else 0)

Is there a way to do the same but only on one line (such as the code that I tried) using apply or other function?

2
  • 1
    You don't use apply for this. You just do the comparison/arithmetic in one go, e.g. df[['Age', 'Salary']] > 30 and then fill the true/false values (e.g. with mask()). You can use pipe() to avoid retyping the column slice: df[['Age', 'Salary']].pipe(lambda x: x.mask(x > 30, x + 100)) Commented yesterday
  • @jqurious The comparison/arithmetic in one go is basically true, but pipe() is not necessary. Masks that have a subset of the column index apply to only that subset in the target frame. Commented yesterday

3 Answers 3

1

Use map:

df[['Age', 'Salary']] = df[['Age', 'Salary']].map(lambda x: x + 100 if x > 30 else 0)

It does exactly what your single-column apply does, but works for multiple columns in one line.

Sign up to request clarification or add additional context in comments.

3 Comments

No, don't use apply or applymap.
This is exactly what I need. The function was changed to. .map instead of .applymap since version 2.1.0 according to the pandas website, and it works.
It may function, but it's essentially guaranteed to be less efficient than the vectorised approach.
1

Don't use a lambda. Don't use apply nor applymap. Use a simple boolean mask:

import numpy as np
import pandas as pd

rand = np.random.default_rng(seed=0)
df = pd.DataFrame(
    columns=('Age', 'Salary', 'Other'),
    data=rand.uniform(low=10, high=50, size=(20, 3)),
)

print(df)
mask = df[['Age', 'Salary']] > 30
df[mask] += 100
df[~mask] = 0
print(df)
          Age     Salary      Other
0   35.478467  20.791469  11.638941
1   10.661105  42.530810  46.510223
2   34.265431  39.179862  31.745000
3   47.402897  42.634142  10.109540
4   44.296171  11.343423  39.186218
5   17.026225  44.527157  31.658449
6   21.988476  26.907489  11.132787
7   14.971331  36.824977  35.887580
8   34.615404  25.347102  49.888397
9   49.233414  37.421679  36.018371
10  37.537869  25.556857  15.403860
11  38.859534  31.014173  22.409675
12  29.433414  45.579513  47.361741
13  24.311808  32.861193  22.874776
14  33.772001  23.516449  25.664760
15  45.610974  19.086304  34.927486
16  13.360614  43.305766  41.483932
17  19.574778  45.059369  12.342721
18  23.444682  16.011179  28.013575
19  41.852971  19.225688  12.080852

           Age      Salary      Other
0   135.478467    0.000000  11.638941
1     0.000000  142.530810  46.510223
2   134.265431  139.179862  31.745000
3   147.402897  142.634142  10.109540
4   144.296171    0.000000  39.186218
5     0.000000  144.527157  31.658449
6     0.000000    0.000000  11.132787
7     0.000000  136.824977  35.887580
8   134.615404    0.000000  49.888397
9   149.233414  137.421679  36.018371
10  137.537869    0.000000  15.403860
11  138.859534  131.014173  22.409675
12    0.000000  145.579513  47.361741
13    0.000000  132.861193  22.874776
14  133.772001    0.000000  25.664760
15  145.610974    0.000000  34.927486
16    0.000000  143.305766  41.483932
17    0.000000  145.059369  12.342721
18    0.000000    0.000000  28.013575
19  141.852971    0.000000  12.080852

Comments

0

If you iterate over a dataframe with apply, you should explicitly call your columns in lambda function and, in your case, define the desired axis, like this:

df.apply(lambda x : x["Salary"]+100 if x["Age"]>30 else 0, axis=1)

which will give you a Series of modified column or, better, use numpy's where method:

df["Salary"] = np.where(df["Age"]>30, df["Salary"]+100, 0)

which will result in desired dataframe.

1 Comment

Don't use a lambda nor apply for this case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.