I have data in a pandas DataFrame with a MultiIndex. Let's call the labels of my MultiIndex "Run", "Trigger", and "Cluster". Separately, I have a list of pre-computed selection criteria that I get as a list of entries passing (these tend to be sparse, so listing passing indexes is most space efficient). The selection cuts may only be partially indexed, e.g. may only specify "Run" or ("Run", "Trigger") pairs.
How do I efficiently apply these cuts, ideally without having to inspect them to find their levels?
For example, consider the following data:
index = pandas.MultiIndex.from_product([[0,1,2],[0,1,2],[0,1]], names=['Run','Trigger','Cluster'])
df = pandas.DataFrame(np.random.rand(len(index),3), index=index, columns=['a','b','c'])
print(df)
a b c
Run Trigger Cluster
0 0 0 0.789090 0.776966 0.764152
1 0.196648 0.635954 0.479195
1 0 0.007268 0.675339 0.966958
1 0.055030 0.794982 0.660357
2 0 0.987798 0.907868 0.583545
1 0.114886 0.839434 0.070730
1 0 0 0.520827 0.626102 0.088976
1 0.377423 0.934224 0.404226
1 0 0.081669 0.485830 0.442296
1 0.620439 0.537927 0.406362
2 0 0.155784 0.243656 0.830895
1 0.734176 0.997579 0.226272
2 0 0 0.867951 0.353823 0.541483
1 0.615694 0.202370 0.229423
1 0 0.912423 0.239199 0.406443
1 0.188609 0.053396 0.222914
2 0 0.698515 0.493518 0.201951
1 0.415195 0.975365 0.687365
Selection criteria may take any of the following forms:
set1:
Int64Index([0], dtype='int64', name='Run')
set2:
MultiIndex([(0, 1),
(1, 2)],
names=['Run', 'Trigger'])
set3:
MultiIndex([(0, 0, 1),
(1, 0, 1),
(2, 1, 0)],
names=['Run', 'Trigger', 'Cluster'])
Application of these selection lists using a hypothetical select method would result in:
>>> print(df.select(set1))
a b c
Run Trigger Cluster
0 0 0 0.789090 0.776966 0.764152
1 0.196648 0.635954 0.479195
1 0 0.007268 0.675339 0.966958
1 0.055030 0.794982 0.660357
2 0 0.987798 0.907868 0.583545
1 0.114886 0.839434 0.070730
>>> print(df.select(set2))
a b c
Run Trigger Cluster
0 1 0 0.007268 0.675339 0.966958
1 0.055030 0.794982 0.660357
1 2 0 0.155784 0.243656 0.830895
1 0.734176 0.997579 0.226272
>>> print(df.select(set3))
a b c
Run Trigger Cluster
0 0 1 0.196648 0.635954 0.479195
1 0 1 0.377423 0.934224 0.404226
2 1 0 0.912423 0.239199 0.406443
pandas can join these kinds of mixed-level indices easily, so it seems like this should be a straightforward operation, but I can't figure out the write calls. loc works for set3 because the indices are the same depth, but I need a general solution.