How to select dataframe rows based on index value and columns value criteria?

Question

I have a dataframe that is a stacked version of n=9 dataframes:

df
        f     a   config
0   0.491 0.368  old.000
1   0.369 0.333  old.000
2   0.372 0.276  old.000
3   0.346 0.300  old.000
4   0.213 0.161  old.000
..    ...   ...      ...
212 1.000 1.000  fin.111
213 1.000 1.000  fin.111
214 1.000 1.000  fin.111
215 1.000 1.000  fin.111
216 1.000 1.000  fin.111

[1953 rows x 3 columns]

Each "stacked" sub-dataframe corresponds to a different value of config, ie:

df['config'].unique()
array(['old.000', 'fin.000', 'fin.001', 'fin.010', 'fin.011', 'fin.100',
       'fin.101', 'fin.110', 'fin.111'], dtype=object)

I want to "filter" this dataframe by a criteria (composed of index and config value) given by a pd.Series:

ser_criteria
0      old.001
1      fin.101
2      fin.100
3      fin.101
4      fin.101
        ...   
212    fin.000
213    old.000
214    old.000
215    old.000
216    old.000
Length: 217, dtype: object

So, I would need my output to be given by:

df_filtered
        f     a   config
0   0.481 0.368  old.001
1   0.569 0.333  fin.101
2   0.672 0.276  fin.100
3   0.378 0.111  fin.101
4   0.987 0.213  fin.101
..    ...   ...   ...   
212 0.500 0.111  fin.000
213 1.000 1.000  old.000
214 0.765 0.123  old.000
215 0.000 1.000  old.000
216 0.333 0.123  old.000

[217 rows x 3 columns]

What is the more efficient way to do this? The only way I could find was to do this element by element (from index)...

jezrael · Accepted Answer · 2022-04-04 11:30:25Z

2

Convert index to index columns in both DataFrames and use right join:

df = (ser_criteria.rename('config')
                  .reset_index()
                  .merge(df.reset_index(), on=['index','config'],how='right')
                  .drop('index', axis=1))

edited Apr 4, 2022 at 11:30

answered Apr 4, 2022 at 10:38

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

baileythegreen Over a year ago

how='left' produces columns f and a full of NaN values because that dataframe is less complete. It needs to be how='right'.

baileythegreen · Accepted Answer · 2022-04-04 10:47:14Z

1

If the index values in your filter criteria are not important, you can do the following:

import pandas as pd

# Assign your dataframe to df (example)
df = pd.read_csv("Documents/dataframe.tsv", sep="\t")
df

f   a   config
0   0.491   0.368   old.000
1   0.369   0.333   old.000
2   0.372   0.276   old.000
3   0.346   0.300   old.000
4   0.213   0.161   old.000
212 1.000   1.000   fin.111
213 1.000   1.000   fin.111
214 1.000   1.000   fin.111
215 1.000   1.000   fin.111
216 1.000   1.000   fin.111


# Assign your filter series to filter (example)
filter = pd.Series(('old.001', 'fin.101', 'fin.100', 'fin.101', 'fin.101', 'fin.000', 'old.000', 'old.000', 'old.000', 'old.000'))
filter

0    old.001
1    fin.101
2    fin.100
3    fin.101
4    fin.101
5    fin.000
6    old.000
7    old.000
8    old.000
9    old.000
dtype: object

# Then subset the dataframe by which rows have a config value in your filter
df[df['config'].isin(filter)]

answered Apr 4, 2022 at 10:47

baileythegreen

1,2045 silver badges16 bronze badges

4 Comments

MrT77 Over a year ago

The index values are part of the criteria.

baileythegreen Over a year ago

Are the index values repeated in your input? It seemed like they were not important because the output has different information for the row with index 216 than the input did.

MrT77 Over a year ago

Yes, the index values are repeated. Probably I made an error while copying the values.

baileythegreen Over a year ago

Then, the strategy from the other answer by @jezrael is probably the simplest solution (noting my comment that it needs to be how='right' given the order of dataframes used). I am not coming up with a way to do this without getting a column from the index values. Indices are generally intended to be unique identifiers, and you're kind of trying to treat them as data, which makes it a hard problem.

Collectives™ on Stack Overflow

How to select dataframe rows based on index value and columns value criteria?

2 Answers 2

1 Comment

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related