How do I check a pandas Column for a list of strings?

Question

The Data frame looks like

I was looking for a way to search the Parent Task column for keywords to then add a new column and put in a category name for it. An example is any row with the keywords (My projects, Learning a skill, Business) will have a My project tag in a column called Catagory.

current dataframe(csv file):

Start Date,Task Name,Duration (hours),Parent Task

01/02/2021,Sleeping ,1.33639,

02/02/2021,Sleeping ,6.43167,

02/02/2021,coding,0.78028,Learning a skill

02/02/2021,Commute,0.22694,

02/02/2021,reading,1.14778,My_projects

02/02/2021,Commute,0.56139,

02/02/2021,Prep,0.37611,

desired dataframe(csv file):

Start Date,Task Name,Duration (hours),Parent Task, Category

01/02/2021,Sleeping ,1.33639,,Sleeping

02/02/2021,Sleeping ,6.43167,,Sleeping

02/02/2021,coding,0.78028,Learning a skill,My project

02/02/2021,Commute,0.22694,,Commute

02/02/2021,reading,1.14778,My projects, My project

02/02/2021,Commute,0.56139,, Commute

02/02/2021,Prep,0.37611,, Prep

I have been trying to apply this method:

My_projects_tasks = '|'.join(['My_projects', 'Learning a skill', 'Business'])
if df['Parent Task'].str.contains( My_projects_tasks , na=False):
    df['Category'] = 'My_project'

But I am getting this error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there a more efficient way of going about this? as I have multiple categories to add and there are a lot of rows? I would then want to sum up all of the durations for each category per day and show that in a different CSV file but I haven't gotten that far as of yet. Thanks

Please include your df in text so it can be copied easily to reproduce your problem. Also post your desired df, because it's quite hard to understand what your goal is. — DSteman
– DSteman, Commented Jun 21, 2021 at 13:07

Anurag Dabas · Accepted Answer · 2021-06-21 13:34:21Z

1

IIUC:

try via fillna() and replace():

d={'Learning a skill':'My_projects','Business':'My_projects'}
df['Category']=df['Parent Task'].fillna(df['Task Name']).replace(d)

answered Jun 21, 2021 at 13:34

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

gustavo martinis · Accepted Answer · 2021-06-21 13:23:54Z

0

You can try to generate a bool series an then you just need to apply a function to add My_project , something like:

from numpy import nan

df['Category'] = df['Parent Task'].isin(['My_projects', 'Learning a skill', 'Business']).apply(lambda x: 'My_project' if x else nan)

answered Jun 21, 2021 at 13:23

gustavo martinis

131 silver badge5 bronze badges

1 Comment

Prince_persia22 Over a year ago

This does Work, but how do I add multiple categories now. if you wrote this out as an If statement it could be used but I'm not sure how to write it out as an if statment.

Collectives™ on Stack Overflow

How do I check a pandas Column for a list of strings?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related