1

I've got some data that looks like this:

>>> print totals.sample(4)
                                                 start            end  \
time                    region_type                                     
2016-01-24 02:17:10.238 STACK GUARD        79940452352    79940665344   
2016-01-23 20:14:17.043 MALLOC metadata    64688259072    64688996352   
2016-01-22 23:20:53.752 IOKit              47857778688    47861174272   
2016-01-23 08:17:06.561 __DATA           3711964667904  3711979212800   

                                            vsize    rsdnt   dirty     swap  
time                    region_type                                          
2016-01-24 02:17:10.238 STACK GUARD        212992        0       0        0  
2016-01-23 20:14:17.043 MALLOC metadata    737280    81920   81920     8192  
2016-01-22 23:20:53.752 IOKit             3395584    24576   24576  3371008  
2016-01-23 08:17:06.561 __DATA           14544896  4907008  618496  4780032  

I want to know the region_type for any row where dirty+swap is greater than 1e7:

This works, but it seems pretty verbose:

>>> print totals[(totals.dirty + totals.swap) > 1e7].groupby(level='region_type').\ 
        apply(lambda x: 'lol').index.tolist()  

  ['MALLOC_NANO', 'MALLOC_SMALL']

Is there a better way?

I would have thought this would work, but it gives all the region_types in the data set, not the ones I selected:

totals[(totals.dirty + totals.swap) > 1e7].index.levels[1].tolist()
0

1 Answer 1

2

Use index.get_level_values (which returns the values used), not index.levels (which returns the values the index knows about):

mask = totals['dirty']+totals['swap'] > 1e7
result = mask.loc[mask]
region_types = result.index.get_level_values('region_type').unique()

For example,

In [243]: mask = totals['dirty']+totals['swap'] > 1e3; mask
Out[243]: 
time                     region_type    
2016-01-24 02:17:10.238  STACK GUARD        False
2016-01-23 20:14:17.043  MALLOC metadata     True
2016-01-22 23:20:53.752  IOKit               True
2016-01-23 08:17:06.561  __DATA              True
dtype: bool

In [244]: result = mask.loc[mask]; result
Out[244]: 
time                     region_type    
2016-01-23 20:14:17.043  MALLOC metadata    True
2016-01-22 23:20:53.752  IOKit              True
2016-01-23 08:17:06.561  __DATA             True
dtype: bool

In [245]: result.index.get_level_values('region_type').unique()
Out[245]: array(['MALLOC metadata', 'IOKit', '__DATA'], dtype=object)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.