10

I have a DataFrame in which the columns are MultiIndex and the index is a list of names, ie index=['Andrew', 'Bob', 'Calvin',...].

I would like to create a function to return all rows of the dataframe that use the name 'Bob' or perhaps start with the letter 'A' or start with lowercase. How can this be done?

I looked into the df.filter() with the regex argument, but it fails and I get:

df.filter(regex='a')
TypeError: expected string or buffer

or:

df.filter(regex=('a',1)
TypeError: first argument must be string or compiled pattern

I've tried other things such as passing re.compile('a') to no avail.

3
  • Same problems, nothing new Commented Feb 25, 2016 at 21:10
  • stackoverflow.com/questions/15325182/… Commented Feb 25, 2016 at 21:11
  • In that example they are filtering the column, the index defaults to [0,1,2,3]. My index is a list of names. Commented Feb 25, 2016 at 21:14

3 Answers 3

10

So it looks like part of my problem with filter was that I was using an outdated version of pandas. After updating I no longer get the TypeError. After some playing around, it looks like I can use filter to fit my needs. Here is what I found out.

Simply setting df.filter(regex='string') will return the columns which match the regex. This looks to do the same as df.filter(regex='string', axis=1).

To search the index, I simply need to do df.filter(regex='string', axis=0)

Sign up to request clarification or add additional context in comments.

Comments

5

Maybe try a different approach by using list comprehension and .ix:

import pandas as pd

df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])

df.ix[[x for x in df.index if x=='Bob']]

df.ix[[x for x in df.index if x[0]=='A']]

df.ix[[x for x in df.index if x.islower()]]

1 Comment

Thanks this answers what I was asking. Any idea if anyone uses df.filter? It would be nice to see some examples. This is nice, but then I need to separately handle searching the columns making my code less concise
2

How about using pandas.Series.str.contains(). The function works in both series and index if your index is confined to the string. Boolean for non-string becomes nan.

import pandas as pd
df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])
mask = df.index.str.contains(rf"^A")
columns = df.index[mask]  # columns = Index(['Andrew'], dtype='object')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.