What does the subset argument do in pandas.io.formats.style.Styler.format?

Question

The public documentation for pandas.io.formats.style.Styler.format says

subset : IndexSlice
An argument to DataFrame.loc that restricts which elements formatter is applied to.

But looking at the code, that's not quite true... what is this _non_reducing_slice stuff?

    if subset is None:
        row_locs = range(len(self.data))
        col_locs = range(len(self.data.columns))
    else:
        subset = _non_reducing_slice(subset)
        if len(subset) == 1:
            subset = subset, self.data.columns

        sub_df = self.data.loc[subset]

Use case: I want to format a particular row, but I get an error when I naively follow the documentation with something that works fine with .loc[]:

>>> import pandas as pd
>>>
>>> df = pd.DataFrame([dict(a=1,b=2,c=3),dict(a=3,b=5,c=4)])
>>> df = df.set_index('a')
>>> print df
   b  c
a
1  2  3
3  5  4
>>> def J(x):
...     return '!!!%s!!!' % x
...
>>> df.style.format(J, subset=[3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 372, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 841, in _getitem_tuple
    self._has_valid_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 189, in _has_valid_tuple
    if not self._has_valid_type(k, i):
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
    (key, self.obj._get_axis_name(axis)))
KeyError: 'None of [[3]] are in the [columns]'
>>> df.loc[3]
b    5
c    4
Name: 3, dtype: int64
>>> df.loc[[3]]
   b  c
a
3  5  4

OK, I tried using IndexSlice and it seems flaky -- works in some cases, doesn't work in others, at least in Pandas 0.20.3:

Python 2.7.14 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:34:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> idx = pd.IndexSlice
>>> r = np.arange(16).astype(int)
>>> colors = 'red green blue yellow'.split()
>>> df = pd.DataFrame(dict(a=[colors[i] for i in r//4], b=r%4, c=r*100)).set_index(['a','b'])
>>> print df
             c
a      b
red    0     0
       1   100
       2   200
       3   300
green  0   400
       1   500
       2   600
       3   700
blue   0   800
       1   900
       2  1000
       3  1100
yellow 0  1200
       1  1300
       2  1400
       3  1500
>>> df.loc[idx['yellow']]
      c
b
0  1200
1  1300
2  1400
3  1500
>>> def J(x):
...     return '!!!%s!!!' % x
...
>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 372, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 836, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 948, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1023, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1541, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1081, in _getitem_iterable
    self._has_valid_type(key, axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
    (key, self.obj._get_axis_name(axis)))
KeyError: "None of [['yellow']] are in the [columns]"
>>> pd.__version__
u'0.20.3'

In pandas 0.24.2 I get a similar error but slightly different:

>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 401, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1494, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 868, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 969, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1048, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1902, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1205, in _getitem_iterable
    raise_missing=False)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1246, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)))
KeyError: u"None of [Index([u'yellow'], dtype='object')] are in the [columns]"
>>> pd.__version__
u'0.24.2'

Oh wait -- I wasn't specifying enough index information; this works:

df.style.format(J,idx['yellow',:])

I have used this argument to apply formatting to some, but not all, cells in a dataframe. Is there some case where its ultimate behavior is surprising you? For example, a case where calling format(..., subset=s) gives different results than df.loc[s]? I think that would qualify as a bug. — NicholasM
– NicholasM, Commented Dec 5, 2019 at 20:53
It's says right there that [3] is not your column. What exactly are you trying to do? — Quang Hoang
– Quang Hoang, Commented Dec 5, 2019 at 20:59
I'm trying to apply my Formatter to a portion of the data frame, in particular one row, that is selected by .loc[], which is exactly what 3 or [3] does.. — Jason S
– Jason S, Commented Dec 5, 2019 at 21:00

NicholasM · Accepted Answer · 2019-12-05 21:18:07Z

1

I agree that the behavior you showed is not ideal.

>>> df = (pandas.DataFrame([dict(a=1,b=2,c=3),
                            dict(a=3,b=5,c=4)])
            .set_index('a'))
>>> df.loc[[3]]
   b  c
a      
3  5  4
>>> df.style.format('{:.2f}', subset=[3])
Traceback (most recent call last)
...
KeyError: "None of [Int64Index([3], dtype='int64')] are in the [columns]"

You can work around this issue by passing a fully-formed pandas.IndexSlice as the subset argument:

>>> df.style.format('{:.2f}', subset=pandas.IndexSlice[[3], :])

Since you asked what _non_reducing_slice() is doing, its goal is reasonable (ensure a subset does not reduce dimensionality to Series). Its implementation treats a list as a sequence of column names:

From pandas/core/indexing.py:

def _non_reducing_slice(slice_):
    """
    Ensurse that a slice doesn't reduce to a Series or Scalar.

    Any user-paseed `subset` should have this called on it
    to make sure we're always working with DataFrames.
    """
    # default to column slice, like DataFrame
    # ['A', 'B'] -> IndexSlices[:, ['A', 'B']]
    kinds = (ABCSeries, np.ndarray, Index, list, str)
    if isinstance(slice_, kinds):
        slice_ = IndexSlice[:, slice_] 
    ...

I wonder if the documentation could be improved: in this case, the exception raised with subset=[3] matches the behavior of df[[3]] rather than df.loc[[3]].

edited Dec 5, 2019 at 21:18

answered Dec 5, 2019 at 21:01

NicholasM

4,7441 gold badge24 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Jason S Over a year ago

OK. It would be great if the documentation matched the code. (such an example as you give would be really helpful, especially since selecting one or more rows is a rather common thing)

Jason S Over a year ago

so.... you answered the question of what I was trying to do, but not the question I asked... what exactly is subset really doing in the code?

Quang Hoang Over a year ago

But the document does say that subset: IndexSlice.

Jason S Over a year ago

GAH! How did I miss that? :(

Jason S Over a year ago

now I just have to figure out what IndexSlice does :/

|

Quang Hoang · Accepted Answer · 2019-12-05 20:53:57Z

1

It indeed does what it supposed to do.

df = pd.DataFrame(np.arange(16).reshape(4,4))

df.style.background_gradient(subset=[0,1])

df.style.background_gradient()

gives:

respectively.

answered Dec 5, 2019 at 20:53

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

1 Comment

Jason S Over a year ago

In your case, it does. In general, it does not.

Collectives™ on Stack Overflow

What does the subset argument do in pandas.io.formats.style.Styler.format?

2 Answers 2

8 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related