0

Pardon my relative inexperience in Python, but this problem has kept me stuck for some time now:

I have a dataframe, df1 like this:

ID      Hourly Rate    Category
0   8900    2.99    Car
1   9904    9.99    Car
2   6381    19.99   Bike
3   5862    2.99    Bike
4   2270    2.99    Car

(0-4 are just row numbers). Now I want to make df2 in such a way that this data of column Category will be changed as per following condition:

if Category is Car: C if Category is Bike: B (There can be other categories as well)

i.e. df2 would be as follows:

ID      Hourly Rate    Category
0   8900    2.99    C
1   9904    9.99    C
2   6381    19.99   B
3   5862    2.99    B
4   2270    2.99    C

I have used a pretty trivial approach to use the if conditions within the function, but want to do it using Lambda Function.

2
  • Do you want the category to be based on its first alphabet? Commented Apr 20, 2018 at 10:02
  • @shivsn: Thanks! Category can be any value by the way - no correlation like that. Commented Apr 20, 2018 at 10:04

2 Answers 2

1

If your values are categorical, I recommend using the Pandas Built in type Categorical Data.

df2 = df.copy()
df2.Category = df2.Category.astype('category')
print(df2.Category.values.categories)
#Prints: Index(['Bike', 'Car'], dtype='object')

#Define your own Categories
df2.Category.values.categories = ['B', 'C']

Output

ID  Hourly  Rate    Category
0   0   8900    2.99    C
1   1   9904    9.99    C
2   2   6381    19.99   B
3   3   5862    2.99    B
4   4   2270    2.99    C
Sign up to request clarification or add additional context in comments.

2 Comments

No, Category is the column name in the df provided. I converted it using astype('category'), this is the working code.
Sorry I didn't check properly. It indeed works very well. It's applying the categories (ones you provided in last line) on df1's column values alphabetically, right?
1

I think best here is use map by dictionary for define catagories:

df['Category'] = df['Category'].map({'Car':'C','Bike':'B'}).fillna('No match')
print (df)
   ID  Hourly   Rate Category
0   0    8900   2.99        C
1   1    9904   9.99        C
2   2    6381  19.99        B
3   3    5862   2.99        B
4   4    2270   2.99        C

Also for improve memory usage is possible use:

df['Category'] = pd.Categorical(df['Category'].map({'Car':'C','Bike':'B'}).fillna('No match'))
print (df)
   ID  Hourly   Rate Category
0   0    8900   2.99        C
1   1    9904   9.99        C
2   2    6381  19.99        B
3   3    5862   2.99        B
4   4    2270   2.99        C

If categories have multiple values is possible define them in list of dict:

print (df)
   ID  Hourly   Rate Category
0   0    8900   2.99     Car1
1   1    9904   9.99     Car2
2   2    6381  19.99    Bike1
3   3    5862   2.99     Bike
4   4    2270   2.99      Car

d = {'C':['Car','Car1','Car2'], 'B':['Bike','Bike1','Bike2']}
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'Car1': 'C', 'Bike': 'B', 'Bike2': 'B', 'Car2': 'C', 'Car': 'C', 'Bike1': 'B'}

df['Category'] = pd.Categorical(df['Category'].map(d1).fillna('No match'))
print (df)
   ID  Hourly   Rate Category
0   0    8900   2.99        C
1   1    9904   9.99        C
2   2    6381  19.99        B
3   3    5862   2.99        B
4   4    2270   2.99        C

EDIT:

If need define values in loop, one possible solution is custom function:

def f(x):
    if x == 'Car':
        return 'C'
    elif x == 'Bike':
        return 'B'
    else:
        return 'No match'

df['Category'] = df['Category'].apply(f)
print (df)
   ID  Hourly   Rate Category
0   0    8900   2.99        C
1   1    9904   9.99        C
2   2    6381  19.99        B
3   3    5862   2.99        B
4   4    2270   2.99        C

7 Comments

Thanks a lot. Can we do the same by lambda expression (without map)?
@TalhaIrfan - Sure, give me a sec
@TalhaIrfan, Why would you want to use lambda for this? It is unnecessary and inefficient.
@jpp: Thanks for suggestion!
@TalhaIrfan - If want lambda - df['Category'] = df['Category'].apply(lambda x: 'C' if x == 'Car' else 'B'), but it is always slow and working only for set catagory for Car and all another values set to B. But i think you have more catagories, so possible inefficient solution should be repeat df['Category'] = df['Category'].apply(lambda x: 'C' if x == 'Car' else x) and df['Category'] = df['Category'].apply(lambda x: 'B' if x == 'Bike' else x) what is very slow and ugly.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.