I was looking for a way to search the Parent Task column for keywords to then add a new column and put in a category name for it. An example is any row with the keywords (My projects, Learning a skill, Business) will have a My project tag in a column called Catagory.
current dataframe(csv file):
Start Date,Task Name,Duration (hours),Parent Task
01/02/2021,Sleeping ,1.33639,
02/02/2021,Sleeping ,6.43167,
02/02/2021,coding,0.78028,Learning a skill
02/02/2021,Commute,0.22694,
02/02/2021,reading,1.14778,My_projects
02/02/2021,Commute,0.56139,
02/02/2021,Prep,0.37611,
desired dataframe(csv file):
Start Date,Task Name,Duration (hours),Parent Task, Category
01/02/2021,Sleeping ,1.33639,,Sleeping
02/02/2021,Sleeping ,6.43167,,Sleeping
02/02/2021,coding,0.78028,Learning a skill,My project
02/02/2021,Commute,0.22694,,Commute
02/02/2021,reading,1.14778,My projects, My project
02/02/2021,Commute,0.56139,, Commute
02/02/2021,Prep,0.37611,, Prep
I have been trying to apply this method:
My_projects_tasks = '|'.join(['My_projects', 'Learning a skill', 'Business'])
if df['Parent Task'].str.contains( My_projects_tasks , na=False):
df['Category'] = 'My_project'
But I am getting this error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Is there a more efficient way of going about this? as I have multiple categories to add and there are a lot of rows? I would then want to sum up all of the durations for each category per day and show that in a different CSV file but I haven't gotten that far as of yet. Thanks
