1

I have a data frame that look like this

     Col1     Col2    
0     22     Apple
1     43     Carrot 
2     54     Orange
3     74     Spinach
4     14     Cucumber 

And I need to add new column with the category "Fruit" , "Vegetable" or "Leaf" I created a list for each category

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

And the result should look like this

    Col1      Col2     Category 
0     22     Apple      Fruit
1     43     Carrot     Vegetable 
2     54     Orange     Fruit
3     74     Spinach    Leaf
4     14     Cucumber   Vegetable

I tried np.where and contains yet both functions give: 'in ' requires string as left operand, not set

3 Answers 3

2

That's because you did not create a list, you created a set as your error shows. You can try making the set a list as the argument for the .isin():

import pandas as pd
import numpy as np
df = pd.DataFrame({'Col1':[22,43,54,74,14],'Col2':['Apple','Carrot','Orange','Spinach','Cucumber']})

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

df['Category'] = np.where(df['Col2'].isin(Fru),'Fruit',
  np.where(df['Col2'].isin(Veg),'Vegetable',
  np.where(df['Col2'].isin(Leaf),'Leaf')))
print(df)

Output:

  Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable
Sign up to request clarification or add additional context in comments.

2 Comments

Yes, I assume there is no other possibility, but I will edit to contemplate it, thanks.
Yes, you are right, I just tested it and list() is not neccessary.
1

Use Series.map with new dictionary d1:

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

d = {'Fruit':Fru, 'Vegetable':Veg,'Leaf':Leaf}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)

df['Category'] = df['Col2'].map(d1)
print (df)
   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

Or use numpy.select:

df['Category'] = np.select([df['Col2'].isin(Fru),df['Col2'].isin(Veg),df['Col2'].isin(Leaf)],
                           ['Fruit','Vegetable','Leaf'])
print (df)

   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

10 Comments

These are the times for our answers! Celius: 0.000997304916381836, Jezrael: 0.0009975433349609375 Go 500k! +3
@CeliusStingher - thank you, but what is number of rows for test?
I tested for 1k, 10k, 100k and they are the virtually the same performance wise
It's not as flexible or robust... but in this case given it's a low number of categories... a direct d1 = {**dict.fromkeys(Fru, 'Fruit'), **dict.fromkeys(Veg, 'Vegetable'), **dict.fromkeys(Leaf, 'Leaf')} is also an option. Some may find it more obvious as to what it's doing reading wise... just throwing it out there.
Yes, of course it is dynamic, that's the great thing about the answer
|
1

Another approach you can try with a for loop:

df = pd.DataFrame({'Col1': [22,43,54,74,14], 'Col2': ['Apple','Carrot','Orange','Spinach','Cucumber']})

Fruit = ['Apple','Orange', 'Grape', 'Blueberry', 'Strawberry']
Vegetable = ['Cucumber','Carrot','Broccoli', 'Onion']
Leaf = ['Lettuce', 'Kale', 'Spinach']

mylist = []
for i in df['Col2']:
    if i in Fruit:
        mylist.append('Fruit')
    elif i in Vegetable:
        mylist.append('Vegetable')
    elif i in Leaf:
        mylist.append('Leaf')

df['Category'] = mylist

print(df)
   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

2 Comments

I like your approach and I tried it but with contain (that was my mistake) but the original data has more 200 item in 15 category which would not be efficient using this way
Yes, other solutions by experts like jezrael and Celius Stingher are more efficient, I wanted to ensure that I post my solution as I spent some time to write this code :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.