1

Trying to find a way of efficiently filtering all entries under both top level columns based on a filter defined for only one of the top level columns. Best explained with the example below and desired output.

Example DataFrame

import pandas as pd
import numpy as np
info = ['price', 'year']
months = ['month0','month1','month2']
settlement_dates = ['2020-12-31', '2021-01-01']
Data = [[[2,4,5],[2020,2021,2022]],[[1,4,2],[2021,2022,2023]]]
Data = np.array(Data).reshape(len(settlement_date),len(months) * len(info))
midx = pd.MultiIndex.from_product([assets, Asset_feature])
df = pd.DataFrame(Data, index=settlement_dates, columns=midx)
df

            price                 year              
           month0 month1 month2 month0 month1 month2
2020-12-31      2      4      5   2020   2021   2022
2021-01-01      1      4      2   2021   2022   2023

Create filter for multiindex dataframe

idx_cols = pd.IndexSlice

df_filter = df.loc[:, idx_cols['year', :]]==2021

df[df_filter]


            price                  year               
           month0 month1 month2  month0  month1 month2
2020-12-31    NaN    NaN    NaN     NaN  2021.0    NaN
2021-01-01    NaN    NaN    NaN  2021.0     NaN    NaN

Desired output:

            price                  year               
           month0 month1 month2  month0  month1 month2
2020-12-31    NaN    4      NaN     NaN  2021.0    NaN
2021-01-01    1      NaN    NaN  2021.0     NaN    NaN

1 Answer 1

1

You can reshape for simplify solution by reshape for DataFrame by DataFrame.stack with filter by DataFrame.where:

df1 = df.stack()

df_filter = df1['year']==2021

df_filter = df1.where(df_filter).unstack()
print (df_filter)
            price                  year               
           month0 month1 month2  month0  month1 month2
2020-12-31    NaN    4.0    NaN     NaN  2021.0    NaN
2021-01-01    1.0    NaN    NaN  2021.0     NaN    NaN

Your solution is possible, but more complicated - there is reshaped mask for repalce missing values by back and forward filling missing values:

idx_cols = pd.IndexSlice

df_filter = df.loc[:, idx_cols['year', :]]==2021

df_filter = df_filter.reindex(df.columns, axis=1).stack(dropna=False).bfill(axis=1).ffill(axis=1).unstack()
print (df_filter)
            price                 year              
           month0 month1 month2 month0 month1 month2
2020-12-31  False   True  False  False   True  False
2021-01-01   True  False  False   True  False  False

print (df[df_filter])
            price                  year               
           month0 month1 month2  month0  month1 month2
2020-12-31    NaN    4.0    NaN     NaN  2021.0    NaN
2021-01-01    1.0    NaN    NaN  2021.0     NaN    NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.