Filter a pandas column using regex within the header

Question

I'm reading in an Excel file to a Pandas data frame but one of the column headers has loads of comments in. It has a keyword 'Measure' amongst all this text which is specific to only this one header. Within the 'contains', how would I filter any header that simply has the keyword 'Measure' somewhere within the header?

The following code is filtering my data frame based 3 filters, but the third filter I just want it to identify the column itself that includes the text 'measure' opposed to having to write it as 'hereisallthe randomtextmeasure'

filtered = df[(df['Mode'].isin(mode_filter)) & (df['Level'].isin(level_filter)) & (df['hereisalltherandomtextmeasure'].isin(measure_filter))]

The reason I'm trying to do this is because I'm running the same code on multiple files but the 'measure' column changes for each file.

First file:

Mode | Level | hereisalltherandomtextmeasure

Second file:

Mode | Level | hereismorerandomtextmeasure

The only static thing about them is that they contain the word measure so ideally I'd like to identify the column that simply contains the word measure opposed to applying a full string.

Thanks.

Sorry what are you asking here? To find the column or to filter it out? — EdChum
– EdChum, Commented Sep 18, 2015 at 14:19
Sorry, I want to simply identify the column that contains the text 'Measure' in it, which I then apply the filter measure_filter too using .isnin. — ashleh
– ashleh, Commented Sep 18, 2015 at 14:22
Then just df.columns[df.columns.str.contains('hereisall the random textMeasure')] will return you that column — EdChum
– EdChum, Commented Sep 18, 2015 at 14:23
I want to ignore the text in front of 'Measure' as depending on what file I load in, this is different each time. So as long as the column header contains 'Measure, then my code will filter on it. — ashleh
– ashleh, Commented Sep 18, 2015 at 14:24
Can you provide some example strings and what exactly you want to match? — Max
– Max, Commented Sep 18, 2015 at 14:25

EdChum · Accepted Answer · 2015-09-18 14:27:43Z

1

IIUC then you can use str.contains to find if your matching string is contained anywhere in the columns:

In [7]:
df = pd.DataFrame(columns=['hereisall the random textMeasure', 'Measurement', 'asdasds'])
df.columns[df.columns.str.contains('Measure')]

Out[7]:
Index(['hereisall the random textMeasure', 'Measurement'], dtype='object')

answered Sep 18, 2015 at 14:27

EdChum

397k204 gold badges837 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ashleh Over a year ago

Hi Ed, thanks for the answer. I have the following code that filters my dataframe on 3 filters:

filtered = df[(df['Mode'].isin(mode_filter)) & (df['Level'].isin(level_filter)) & (df['hereisall the random textMeasure'].isin(measure_filter))]

. So how would the last part incorporate the str.contains element to just search for 'measure'?

EdChum Over a year ago

Rather than post little snippets of additional information, can you edit you question with all the necessary information, SO is a Q+A site, not a forum. Your question should have enough information so we should not need to ask many questions to seek clarification

Collectives™ on Stack Overflow

Filter a pandas column using regex within the header

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related