2

I am wanting to identify all instances within my Pandas csv file that contains text for a specific column, in this case the 'Notes' column, where there are any instances the word 'excercise' is mentioned. Once the rows are identified that contain the 'excercise' keyword in the 'Notes' columnn, I want to create a new column called 'ExcerciseDay' that then has a 1 if the 'excercise' condition was met or a 0 if it was not. I am having trouble because the text can contain long string values in the 'Notes' column (i.e. 'Excercise, Morning Workout,Alcohol Consumed, Coffee Consumed') and I still want it to identify 'excercise' even if it is within a longer string.

I tried the function below in order to identify all text that contains the word 'exercise' in the 'Notes' column. No rows are selected when I use this function and I know it is likely because of the * operator but I want to show the logic. There is probably a much more efficient way to do this but I am still relatively new to programming and python.

def IdentifyExercise(row):
    if row['Notes'] == '*exercise*':
        return 1
    elif row['Notes'] != '*exercise*':
        return 0


JoinedTables['ExerciseDay'] = JoinedTables.apply(lambda row : IdentifyExercise(row), axis=1) 

3 Answers 3

5

Convert boolean Series created by str.contains to int by astype:

JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise').astype(int)

For not case sensitive:

JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise', case=False)
                                                   .astype(int)
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks jezrael. In regards to the astype(int), I have some rows that are NaN (null) in the 'Notes' column. When the new column is created, null values appear in the rows where there are null values in the 'Notes' column and the astype(int) function throws an error on those values. Any idea for how to work around this?
Yes, use another parameter na=False - then NaNs are converted to 0
.contains('exercise', case=False, na=False)
Or JoinedTables['Notes'].str.contains('exercise', case=False).fillna(0).astype(int)
Ah thanks, appreciate the help. I guess I'm too new to stackoverflow to upvote but this worked.
1

You can also use np.where:

JoinedTables['ExerciseDay'] = \
    np.where(JoinedTables['Notes'].str.contains('exercise'), 1, 0)

2 Comments

Thanks COLDSPEED, this also seemed to do the trick. It did however apply a value of 1 to the rows that were null in the 'Notes' column as well as where excercise was identified as True. Any idea how to ignore the null columns?
@DomB Welp, it seems you can upvote answers now that you have > 15 rep ;-)
0

Another way would be:

JoinedTables['ExerciseDay'] =[1 if "exercise" in x  else 0 for x in JoinedTables['Notes']]

(Probably not the fastest solution)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.