2

I have 2 dataframes: DF1

ID Name Category----
1 Apple Fruit
2 Orange Fruit
3 brocolli Vegetable
4 Spinach Vegetable

DF2

UserID Date UserName Description
111 01/01/2020 AAA Ordered 1 Box Apples
111 01/02/2021 AAA Ordered 1KG spinach
222 15/03/2021 BBB Ordered 3 boxes of Orange

Can anyone help how I can match the "Description" from DF2 which contains "Name" string from DF1 and add the respective "Category" column in DF2?

Desired Output:

UserID Date UserName Description Category
111 01/01/2020 AAA Ordered 1 Box Apples Fruit
111 01/02/2021 AAA Ordered 1KG spinach Vegetable
222 15/03/2021 BBB Ordered 3 boxes of Orange Fruit
2

2 Answers 2

2

You can try str.extract then map

import re
c = '('+'|'.join(df1.Name.tolist())+')'

df2['new'] = df2.Description.str.extract(c,flags=re.IGNORECASE)[0].str.upper().\
                  map(dict(zip(df1.Name.str.upper(),df1.Category)))

0        Fruit
1    Vegetable
2        Fruit
Name: 0, dtype: object
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @BENY. Error encountered. C:\Program Files\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame) but in a future version of pandas this will be changed to expand=True (return DataFrame) This is separate from the ipykernel package so we can avoid doing imports until
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-13-1d38e6ad3392> in <module>() 1 c = '('+'|'.join(df1.Name.tolist())+')' 2 ----> 3 df2['new'] = df2.Description.str.extract(c,flags=re.IGNORECASE)[0].str.upper().map(dict(zip(df1.Name.str.upper(),df1.Category))) AttributeError: 'str' object has no attribute 'str'`
1

Edit - Second solution below as per OP comments

First : This code uses MERGE to do the same task

import pandas as pd

# Input Data
df1 = pd.DataFrame({'Name':['Apple','Orange','Brocolli','Spinach'], 'Category':['Fruit', 'Fruit','Vegitable','Vegitable']})
df2 = pd.DataFrame({'Date':['01/01/2020','02/02/2021','03/03/2022'], 'Description':['Ordered 1 Box Apple', 'Ordered 1 KG spinach','Ordered 3 Box Orange']})

# Data Processing
pd.merge(df2, df1, left_on = df2['Description'].str.lower().str.split(' ', expand=True)[3], right_on = df1['Name'].str.lower(), how='left' ).drop('key_0', axis=1)

Output:

enter image description here

Second Solution

Updating code as per OP comments below

 fruit_cat_mapping = { i[0]:i[1] for i in df1[['NAME','CATEGORY']].values}

def mapper_func(x):
  for key in fruit_cat_mapping.keys():
      if x.find(key.lower()) > -1:
         res = fruit_cat_mapping[key]
         return res

df2['Description'].str.lower().apply(lambda x: mapper_func(x))

17 Comments

Thanks Abhishek, works well. However the use case is a bit more complicated. where in dataframe 2 description is free text and may have anything in it. It may be a string with special characters and multiple fruit names in it. I was trying to use contains and apply.
Thanks @Abhishek def matcher(x): print(x) res = df1.loc[df1['NAME'].str.contains(x, regex=False, case=False), 'CATEGORY'] return ','.join(res.astype(str)) df2['Category'] = df2['Description'].apply(matcher) Problem in this solution is contains is checking if description is contained inside Name of df1, i am looking for a reversal of this one.
Check edited code. Created an dictionary first with NAME & CATEGORY. Use that with APPLY & FIND inside a LAMBDA function. Hope this helps.
This ''AssignmentCompleted Assessment Assessment' was not available in the other column.
Hoping every row in df2 must have a representation in other column, Use this function if not so: def mapper_func(x): for key in fruit_cat_mapping.keys(): if x.find(key.lower()) > -1: res = fruit_cat_mapping[key] return res
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.