2

I'm reading in an Excel file to a Pandas data frame but one of the column headers has loads of comments in. It has a keyword 'Measure' amongst all this text which is specific to only this one header. Within the 'contains', how would I filter any header that simply has the keyword 'Measure' somewhere within the header?

The following code is filtering my data frame based 3 filters, but the third filter I just want it to identify the column itself that includes the text 'measure' opposed to having to write it as 'hereisallthe randomtextmeasure'

filtered = df[(df['Mode'].isin(mode_filter)) & (df['Level'].isin(level_filter)) & (df['hereisalltherandomtextmeasure'].isin(measure_filter))]

The reason I'm trying to do this is because I'm running the same code on multiple files but the 'measure' column changes for each file.

First file:

Mode | Level | hereisalltherandomtextmeasure

Second file:

Mode | Level | hereismorerandomtextmeasure

The only static thing about them is that they contain the word measure so ideally I'd like to identify the column that simply contains the word measure opposed to applying a full string.

Thanks.

11
  • Sorry what are you asking here? To find the column or to filter it out? Commented Sep 18, 2015 at 14:19
  • Sorry, I want to simply identify the column that contains the text 'Measure' in it, which I then apply the filter measure_filter too using .isnin. Commented Sep 18, 2015 at 14:22
  • Then just df.columns[df.columns.str.contains('hereisall the random textMeasure')] will return you that column Commented Sep 18, 2015 at 14:23
  • I want to ignore the text in front of 'Measure' as depending on what file I load in, this is different each time. So as long as the column header contains 'Measure, then my code will filter on it. Commented Sep 18, 2015 at 14:24
  • Can you provide some example strings and what exactly you want to match? Commented Sep 18, 2015 at 14:25

1 Answer 1

1

IIUC then you can use str.contains to find if your matching string is contained anywhere in the columns:

In [7]:
df = pd.DataFrame(columns=['hereisall the random textMeasure', 'Measurement', 'asdasds'])
df.columns[df.columns.str.contains('Measure')]

Out[7]:
Index(['hereisall the random textMeasure', 'Measurement'], dtype='object')
Sign up to request clarification or add additional context in comments.

2 Comments

Hi Ed, thanks for the answer. I have the following code that filters my dataframe on 3 filters: filtered = df[(df['Mode'].isin(mode_filter)) & (df['Level'].isin(level_filter)) & (df['hereisall the random textMeasure'].isin(measure_filter))]. So how would the last part incorporate the str.contains element to just search for 'measure'?
Rather than post little snippets of additional information, can you edit you question with all the necessary information, SO is a Q+A site, not a forum. Your question should have enough information so we should not need to ask many questions to seek clarification

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.