0

I have a dataframe that is a stacked version of n=9 dataframes:

df
        f     a   config
0   0.491 0.368  old.000
1   0.369 0.333  old.000
2   0.372 0.276  old.000
3   0.346 0.300  old.000
4   0.213 0.161  old.000
..    ...   ...      ...
212 1.000 1.000  fin.111
213 1.000 1.000  fin.111
214 1.000 1.000  fin.111
215 1.000 1.000  fin.111
216 1.000 1.000  fin.111

[1953 rows x 3 columns]

Each "stacked" sub-dataframe corresponds to a different value of config, ie:

df['config'].unique()
array(['old.000', 'fin.000', 'fin.001', 'fin.010', 'fin.011', 'fin.100',
       'fin.101', 'fin.110', 'fin.111'], dtype=object)

I want to "filter" this dataframe by a criteria (composed of index and config value) given by a pd.Series:

ser_criteria
0      old.001
1      fin.101
2      fin.100
3      fin.101
4      fin.101
        ...   
212    fin.000
213    old.000
214    old.000
215    old.000
216    old.000
Length: 217, dtype: object

So, I would need my output to be given by:

df_filtered
        f     a   config
0   0.481 0.368  old.001
1   0.569 0.333  fin.101
2   0.672 0.276  fin.100
3   0.378 0.111  fin.101
4   0.987 0.213  fin.101
..    ...   ...   ...   
212 0.500 0.111  fin.000
213 1.000 1.000  old.000
214 0.765 0.123  old.000
215 0.000 1.000  old.000
216 0.333 0.123  old.000

[217 rows x 3 columns]

What is the more efficient way to do this? The only way I could find was to do this element by element (from index)...

2 Answers 2

2

Convert index to index columns in both DataFrames and use right join:

df = (ser_criteria.rename('config')
                  .reset_index()
                  .merge(df.reset_index(), on=['index','config'],how='right')
                  .drop('index', axis=1))
Sign up to request clarification or add additional context in comments.

1 Comment

how='left' produces columns f and a full of NaN values because that dataframe is less complete. It needs to be how='right'.
1

If the index values in your filter criteria are not important, you can do the following:

import pandas as pd

# Assign your dataframe to df (example)
df = pd.read_csv("Documents/dataframe.tsv", sep="\t")
df

f   a   config
0   0.491   0.368   old.000
1   0.369   0.333   old.000
2   0.372   0.276   old.000
3   0.346   0.300   old.000
4   0.213   0.161   old.000
212 1.000   1.000   fin.111
213 1.000   1.000   fin.111
214 1.000   1.000   fin.111
215 1.000   1.000   fin.111
216 1.000   1.000   fin.111


# Assign your filter series to filter (example)
filter = pd.Series(('old.001', 'fin.101', 'fin.100', 'fin.101', 'fin.101', 'fin.000', 'old.000', 'old.000', 'old.000', 'old.000'))
filter

0    old.001
1    fin.101
2    fin.100
3    fin.101
4    fin.101
5    fin.000
6    old.000
7    old.000
8    old.000
9    old.000
dtype: object

# Then subset the dataframe by which rows have a config value in your filter
df[df['config'].isin(filter)]

4 Comments

The index values are part of the criteria.
Are the index values repeated in your input? It seemed like they were not important because the output has different information for the row with index 216 than the input did.
Yes, the index values are repeated. Probably I made an error while copying the values.
Then, the strategy from the other answer by @jezrael is probably the simplest solution (noting my comment that it needs to be how='right' given the order of dataframes used). I am not coming up with a way to do this without getting a column from the index values. Indices are generally intended to be unique identifiers, and you're kind of trying to treat them as data, which makes it a hard problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.