Pandas - Converting certain column values in Dataframe using Lambda Expressions

Question

Pardon my relative inexperience in Python, but this problem has kept me stuck for some time now:

I have a dataframe, df1 like this:

ID      Hourly Rate    Category
0   8900    2.99    Car
1   9904    9.99    Car
2   6381    19.99   Bike
3   5862    2.99    Bike
4   2270    2.99    Car

(0-4 are just row numbers). Now I want to make df2 in such a way that this data of column Category will be changed as per following condition:

if Category is Car: C if Category is Bike: B (There can be other categories as well)

i.e. df2 would be as follows:

ID      Hourly Rate    Category
0   8900    2.99    C
1   9904    9.99    C
2   6381    19.99   B
3   5862    2.99    B
4   2270    2.99    C

I have used a pretty trivial approach to use the if conditions within the function, but want to do it using Lambda Function.

@shivsn: Thanks! Category can be any value by the way - no correlation like that. — Failed Scientist
– Failed Scientist, Commented Apr 20, 2018 at 10:04

iDrwish · Accepted Answer · 2018-04-20 10:05:34Z

1

If your values are categorical, I recommend using the Pandas Built in type Categorical Data.

df2 = df.copy()
df2.Category = df2.Category.astype('category')
print(df2.Category.values.categories)
#Prints: Index(['Bike', 'Car'], dtype='object')

#Define your own Categories
df2.Category.values.categories = ['B', 'C']

Output

ID  Hourly  Rate    Category
0   0   8900    2.99    C
1   1   9904    9.99    C
2   2   6381    19.99   B
3   3   5862    2.99    B
4   4   2270    2.99    C

edited Apr 20, 2018 at 10:05

answered Apr 20, 2018 at 9:59

iDrwish

3,1131 gold badge18 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

iDrwish Over a year ago

No, Category is the column name in the df provided. I converted it using astype('category'), this is the working code.

Failed Scientist Over a year ago

Sorry I didn't check properly. It indeed works very well. It's applying the categories (ones you provided in last line) on df1's column values alphabetically, right?

jezrael · Accepted Answer · 2018-04-20 10:18:34Z

1

I think best here is use map by dictionary for define catagories:

df['Category'] = df['Category'].map({'Car':'C','Bike':'B'}).fillna('No match')
print (df)
   ID  Hourly   Rate Category
0   0    8900   2.99        C
1   1    9904   9.99        C
2   2    6381  19.99        B
3   3    5862   2.99        B
4   4    2270   2.99        C

Also for improve memory usage is possible use:

df['Category'] = pd.Categorical(df['Category'].map({'Car':'C','Bike':'B'}).fillna('No match'))
print (df)
   ID  Hourly   Rate Category
0   0    8900   2.99        C
1   1    9904   9.99        C
2   2    6381  19.99        B
3   3    5862   2.99        B
4   4    2270   2.99        C

If categories have multiple values is possible define them in list of dict:

print (df)
   ID  Hourly   Rate Category
0   0    8900   2.99     Car1
1   1    9904   9.99     Car2
2   2    6381  19.99    Bike1
3   3    5862   2.99     Bike
4   4    2270   2.99      Car

d = {'C':['Car','Car1','Car2'], 'B':['Bike','Bike1','Bike2']}
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'Car1': 'C', 'Bike': 'B', 'Bike2': 'B', 'Car2': 'C', 'Car': 'C', 'Bike1': 'B'}

df['Category'] = pd.Categorical(df['Category'].map(d1).fillna('No match'))
print (df)
   ID  Hourly   Rate Category
0   0    8900   2.99        C
1   1    9904   9.99        C
2   2    6381  19.99        B
3   3    5862   2.99        B
4   4    2270   2.99        C

EDIT:

If need define values in loop, one possible solution is custom function:

def f(x):
    if x == 'Car':
        return 'C'
    elif x == 'Bike':
        return 'B'
    else:
        return 'No match'

df['Category'] = df['Category'].apply(f)
print (df)
   ID  Hourly   Rate Category
0   0    8900   2.99        C
1   1    9904   9.99        C
2   2    6381  19.99        B
3   3    5862   2.99        B
4   4    2270   2.99        C

edited Apr 20, 2018 at 10:18

answered Apr 20, 2018 at 10:07

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

7 Comments

Failed Scientist Over a year ago

Thanks a lot. Can we do the same by lambda expression (without map)?

jezrael Over a year ago

@TalhaIrfan - Sure, give me a sec

jpp Over a year ago

@TalhaIrfan, Why would you want to use lambda for this? It is unnecessary and inefficient.

Failed Scientist Over a year ago

@jpp: Thanks for suggestion!

jezrael Over a year ago

@TalhaIrfan - If want lambda - df['Category'] = df['Category'].apply(lambda x: 'C' if x == 'Car' else 'B'), but it is always slow and working only for set catagory for Car and all another values set to B. But i think you have more catagories, so possible inefficient solution should be repeat df['Category'] = df['Category'].apply(lambda x: 'C' if x == 'Car' else x) and df['Category'] = df['Category'].apply(lambda x: 'B' if x == 'Bike' else x) what is very slow and ugly.

|

Collectives™ on Stack Overflow

Pandas - Converting certain column values in Dataframe using Lambda Expressions

2 Answers 2

2 Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related