Pandas - filter and regex search the index of DataFrame

Question

I have a DataFrame in which the columns are MultiIndex and the index is a list of names, ie index=['Andrew', 'Bob', 'Calvin',...].

I would like to create a function to return all rows of the dataframe that use the name 'Bob' or perhaps start with the letter 'A' or start with lowercase. How can this be done?

I looked into the df.filter() with the regex argument, but it fails and I get:

df.filter(regex='a')
TypeError: expected string or buffer

or:

df.filter(regex=('a',1)
TypeError: first argument must be string or compiled pattern

I've tried other things such as passing re.compile('a') to no avail.

In that example they are filtering the column, the index defaults to [0,1,2,3]. My index is a list of names. — Shatnerz
– Shatnerz, Commented Feb 25, 2016 at 21:14

Shatnerz · Accepted Answer · 2016-03-01 15:42:30Z

10

So it looks like part of my problem with filter was that I was using an outdated version of pandas. After updating I no longer get the TypeError. After some playing around, it looks like I can use filter to fit my needs. Here is what I found out.

Simply setting df.filter(regex='string') will return the columns which match the regex. This looks to do the same as df.filter(regex='string', axis=1).

To search the index, I simply need to do df.filter(regex='string', axis=0)

answered Mar 1, 2016 at 15:42

Shatnerz

2,5533 gold badges36 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ezer K · Accepted Answer · 2016-02-25 22:07:19Z

5

Maybe try a different approach by using list comprehension and .ix:

import pandas as pd

df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])

df.ix[[x for x in df.index if x=='Bob']]

df.ix[[x for x in df.index if x[0]=='A']]

df.ix[[x for x in df.index if x.islower()]]

answered Feb 25, 2016 at 22:07

Ezer K

3,7615 gold badges25 silver badges50 bronze badges

1 Comment

Shatnerz Over a year ago

Thanks this answers what I was asking. Any idea if anyone uses df.filter? It would be nice to see some examples. This is nice, but then I need to separately handle searching the columns making my code less concise

MgAl2O4 · Accepted Answer · 2022-03-21 01:48:12Z

2

How about using pandas.Series.str.contains(). The function works in both series and index if your index is confined to the string. Boolean for non-string becomes nan.

import pandas as pd
df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])
mask = df.index.str.contains(rf"^A")
columns = df.index[mask]  # columns = Index(['Andrew'], dtype='object')

answered Mar 21, 2022 at 1:48

MgAl2O4

513 bronze badges

Collectives™ on Stack Overflow

Pandas - filter and regex search the index of DataFrame

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related