Is there a better way to collect unique index values in pandas?

Question

I've got some data that looks like this:

>>> print totals.sample(4)
                                                 start            end  \
time                    region_type                                     
2016-01-24 02:17:10.238 STACK GUARD        79940452352    79940665344   
2016-01-23 20:14:17.043 MALLOC metadata    64688259072    64688996352   
2016-01-22 23:20:53.752 IOKit              47857778688    47861174272   
2016-01-23 08:17:06.561 __DATA           3711964667904  3711979212800   

                                            vsize    rsdnt   dirty     swap  
time                    region_type                                          
2016-01-24 02:17:10.238 STACK GUARD        212992        0       0        0  
2016-01-23 20:14:17.043 MALLOC metadata    737280    81920   81920     8192  
2016-01-22 23:20:53.752 IOKit             3395584    24576   24576  3371008  
2016-01-23 08:17:06.561 __DATA           14544896  4907008  618496  4780032

I want to know the region_type for any row where dirty+swap is greater than 1e7:

This works, but it seems pretty verbose:

>>> print totals[(totals.dirty + totals.swap) > 1e7].groupby(level='region_type').\ 
        apply(lambda x: 'lol').index.tolist()  

  ['MALLOC_NANO', 'MALLOC_SMALL']

Is there a better way?

I would have thought this would work, but it gives all the region_types in the data set, not the ones I selected:

totals[(totals.dirty + totals.swap) > 1e7].index.levels[1].tolist()

unutbu · Accepted Answer · 2016-02-05 20:17:26Z

Use index.get_level_values (which returns the values used), not index.levels (which returns the values the index knows about):

mask = totals['dirty']+totals['swap'] > 1e7
result = mask.loc[mask]
region_types = result.index.get_level_values('region_type').unique()

For example,

In [243]: mask = totals['dirty']+totals['swap'] > 1e3; mask
Out[243]: 
time                     region_type    
2016-01-24 02:17:10.238  STACK GUARD        False
2016-01-23 20:14:17.043  MALLOC metadata     True
2016-01-22 23:20:53.752  IOKit               True
2016-01-23 08:17:06.561  __DATA              True
dtype: bool

In [244]: result = mask.loc[mask]; result
Out[244]: 
time                     region_type    
2016-01-23 20:14:17.043  MALLOC metadata    True
2016-01-22 23:20:53.752  IOKit              True
2016-01-23 08:17:06.561  __DATA             True
dtype: bool

In [245]: result.index.get_level_values('region_type').unique()
Out[245]: array(['MALLOC metadata', 'IOKit', '__DATA'], dtype=object)

Collectives™ on Stack Overflow

Is there a better way to collect unique index values in pandas?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related