1

I would like to loop over each row of a data frame and if there's a match between a column and a string from a list I would add an element in a new column. In this example I want to add a new column to categorize the products.. so if a row of the column match one of the lists, the category could be either 'Drinks' or 'Food' and if there's no match the category would be other.

list_drinks={'Water','Juice','Tea'}
list_food={'Apple','Orange'}
data = {'Price':  ['1', '5','3'], 'Product': ['Juice','book', Pen]}
for (i,j) in itertools.zip_longest(list_drinks,list_food):
    for index in data.index: 
        if(j in data.loc[index,'product']):
            data["Category"] = "Food"
        elif(i in data.loc[index,'product']):
            data["Category"] ="drinks"
        else:
            data["Category"]="Other"
           

The output would be :

Price  Product Category
 1      Juice    drinks
 5      book     Other
 3      Pen      Other

My problem is mainly I don't know how to match the patterns between the lists and the rows. I tried also: str.contains but it did not work.

2 Answers 2

1

No need to loop. You can use .isin() with np.select() to return results based on conditions. See below code:

import pandas as pd
import numpy as np
list_drinks=['Water','Juice','Tea']
list_food=['Apple','Orange']
data = {'Price':  ['1', '5','3'],
    'Product': ['Juice','book','Pen']}
df = pd.DataFrame(data)
df['Category'] = np.select([(df['Product'].isin(list_drinks)),
               (df['Product'].isin(list_food))],
              ['drinks',
              'food'], 'Other')
df
Out[1]: 
  Price Product Category
0     1   Juice   drinks
1     5    book    Other
2     3     Pen    Other

Below, I break down the code into more detail, so you can see how it works. I also have changed slightly from your comment. I check to see if a value from the list is in a substring of a value from the dataframe by using list comprehension and in. To increase the match rate, I also compare the as all lowercase with .lower():

import pandas as pd
import numpy as np
list_drinks=['Water','Juice','Tea']
list_food=['Apple','Orange']
data = {'Price':  ['1', '5','3'],
    'Product': ['green Juice','book','oRange you gonna say banana']}
df = pd.DataFrame(data)
c1 = (df['Product'].apply(lambda x: len([y for y in list_drinks if y.lower() in x.lower()]) > 0))
c2 = (df['Product'].apply(lambda x: len([y for y in list_food if y.lower() in x.lower()]) > 0))
r1 = 'drinks'
r2 = 'food'

conditions = [c1,c2]
results= [r1,r2]

df['Category'] = np.select(conditions, results, 'Other')
df
Out[1]: 
  Price                      Product Category
0     1                  green Juice   drinks
1     5                         book    Other
2     3  oRange you gonna say banana     food
Sign up to request clarification or add additional context in comments.

5 Comments

Hi @David, thanks for your answer. I tried it on a bigger data frame but all the rows are being categorized as other. when you use "isin" does it match part of the strings of the list or does it have to be a perfect match. So let's say, if I have in my data frame product: "green Juice" would that be categorized as "Juice" or "others"?
@thephoenix check the new code. Use the second block, not the first.
Hi @David, so I have an additional question. I used the code for a different data frame and I got this error when I used the np.select statement: "TypeError: invalid entry 0 in condlist: should be boolean ndarray." Do you know what could be the problem ?
Please see: stackoverflow.com/questions/57316346/… . df['Product'].fillna(‘’)
I tried df['Product'].fillna('').apply(lambda x: len([y for y in list_food if y.lower() in x.lower()]) > 0)) but it's still not working @David
1

Here's an alternative-

import itertools
import pandas as pd

list_drinks={'Water','Juice','Tea'}
list_food={'Apple','Orange'}
data = pd.DataFrame({'Price':  ['1', '5','3'], 'Product': ['Juice','book', 'Pen']})
category = list()
for prod in data['Product']: 
    if prod in list_food:
        category.append("Food")
    elif prod in list_drinks:
        category.append("drinks")
    else:
        category.append("Other")
data['Category']= category
print(data)

Output-

Price  Product Category
 1      Juice    drinks
 5      book     Other
 3      Pen      Other

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.