How to search for specific text within a Pandas dataframe column?

Question

I am wanting to identify all instances within my Pandas csv file that contains text for a specific column, in this case the 'Notes' column, where there are any instances the word 'excercise' is mentioned. Once the rows are identified that contain the 'excercise' keyword in the 'Notes' columnn, I want to create a new column called 'ExcerciseDay' that then has a 1 if the 'excercise' condition was met or a 0 if it was not. I am having trouble because the text can contain long string values in the 'Notes' column (i.e. 'Excercise, Morning Workout,Alcohol Consumed, Coffee Consumed') and I still want it to identify 'excercise' even if it is within a longer string.

I tried the function below in order to identify all text that contains the word 'exercise' in the 'Notes' column. No rows are selected when I use this function and I know it is likely because of the * operator but I want to show the logic. There is probably a much more efficient way to do this but I am still relatively new to programming and python.

def IdentifyExercise(row):
    if row['Notes'] == '*exercise*':
        return 1
    elif row['Notes'] != '*exercise*':
        return 0


JoinedTables['ExerciseDay'] = JoinedTables.apply(lambda row : IdentifyExercise(row), axis=1)

jezrael · Accepted Answer · 2017-10-01 19:52:07Z

5

Convert boolean Series created by str.contains to int by astype:

JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise').astype(int)

For not case sensitive:

JoinedTables['ExerciseDay'] = JoinedTables['Notes'].str.contains('exercise', case=False)
                                                   .astype(int)

answered Oct 1, 2017 at 19:52

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

DEB Over a year ago

Thanks jezrael. In regards to the astype(int), I have some rows that are NaN (null) in the 'Notes' column. When the new column is created, null values appear in the rows where there are null values in the 'Notes' column and the astype(int) function throws an error on those values. Any idea for how to work around this?

jezrael Over a year ago

Yes, use another parameter na=False - then NaNs are converted to 0

jezrael Over a year ago

.contains('exercise', case=False, na=False)

jezrael Over a year ago

Or JoinedTables['Notes'].str.contains('exercise', case=False).fillna(0).astype(int)

DEB Over a year ago

Ah thanks, appreciate the help. I guess I'm too new to stackoverflow to upvote but this worked.

cs95 · Accepted Answer · 2017-10-01 19:57:18Z

1

You can also use np.where:

JoinedTables['ExerciseDay'] = \
    np.where(JoinedTables['Notes'].str.contains('exercise'), 1, 0)

answered Oct 1, 2017 at 19:57

cs95

406k106 gold badges744 silver badges797 bronze badges

2 Comments

DEB Over a year ago

Thanks COLDSPEED, this also seemed to do the trick. It did however apply a value of 1 to the rows that were null in the 'Notes' column as well as where excercise was identified as True. Any idea how to ignore the null columns?

cs95 Over a year ago

@DomB Welp, it seems you can upvote answers now that you have > 15 rep ;-)

JoseleMG · Accepted Answer · 2017-10-01 20:14:50Z

0

Another way would be:

JoinedTables['ExerciseDay'] =[1 if "exercise" in x  else 0 for x in JoinedTables['Notes']]

(Probably not the fastest solution)

answered Oct 1, 2017 at 20:14

JoseleMG

3125 silver badges20 bronze badges

Collectives™ on Stack Overflow

How to search for specific text within a Pandas dataframe column?

3 Answers 3

5 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related