1

The Data frame looks like enter image description here

I was looking for a way to search the Parent Task column for keywords to then add a new column and put in a category name for it. An example is any row with the keywords (My projects, Learning a skill, Business) will have a My project tag in a column called Catagory.

current dataframe(csv file):

Start Date,Task Name,Duration (hours),Parent Task

01/02/2021,Sleeping ,1.33639,

02/02/2021,Sleeping ,6.43167,

02/02/2021,coding,0.78028,Learning a skill

02/02/2021,Commute,0.22694,

02/02/2021,reading,1.14778,My_projects

02/02/2021,Commute,0.56139,

02/02/2021,Prep,0.37611,

desired dataframe(csv file):

Start Date,Task Name,Duration (hours),Parent Task, Category

01/02/2021,Sleeping ,1.33639,,Sleeping

02/02/2021,Sleeping ,6.43167,,Sleeping

02/02/2021,coding,0.78028,Learning a skill,My project

02/02/2021,Commute,0.22694,,Commute

02/02/2021,reading,1.14778,My projects, My project

02/02/2021,Commute,0.56139,, Commute

02/02/2021,Prep,0.37611,, Prep

I have been trying to apply this method:

My_projects_tasks = '|'.join(['My_projects', 'Learning a skill', 'Business'])
if df['Parent Task'].str.contains( My_projects_tasks , na=False):
    df['Category'] = 'My_project'

But I am getting this error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there a more efficient way of going about this? as I have multiple categories to add and there are a lot of rows? I would then want to sum up all of the durations for each category per day and show that in a different CSV file but I haven't gotten that far as of yet. Thanks

1
  • Please include your df in text so it can be copied easily to reproduce your problem. Also post your desired df, because it's quite hard to understand what your goal is. Commented Jun 21, 2021 at 13:07

2 Answers 2

1

IIUC:

try via fillna() and replace():

d={'Learning a skill':'My_projects','Business':'My_projects'}
df['Category']=df['Parent Task'].fillna(df['Task Name']).replace(d)
Sign up to request clarification or add additional context in comments.

Comments

0

You can try to generate a bool series an then you just need to apply a function to add My_project , something like:

from numpy import nan

df['Category'] = df['Parent Task'].isin(['My_projects', 'Learning a skill', 'Business']).apply(lambda x: 'My_project' if x else nan)

1 Comment

This does Work, but how do I add multiple categories now. if you wrote this out as an If statement it could be used but I'm not sure how to write it out as an if statment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.