5

The public documentation for pandas.io.formats.style.Styler.format says

subset : IndexSlice
An argument to DataFrame.loc that restricts which elements formatter is applied to.

But looking at the code, that's not quite true... what is this _non_reducing_slice stuff?

    if subset is None:
        row_locs = range(len(self.data))
        col_locs = range(len(self.data.columns))
    else:
        subset = _non_reducing_slice(subset)
        if len(subset) == 1:
            subset = subset, self.data.columns

        sub_df = self.data.loc[subset]

Use case: I want to format a particular row, but I get an error when I naively follow the documentation with something that works fine with .loc[]:

>>> import pandas as pd
>>>
>>> df = pd.DataFrame([dict(a=1,b=2,c=3),dict(a=3,b=5,c=4)])
>>> df = df.set_index('a')
>>> print df
   b  c
a
1  2  3
3  5  4
>>> def J(x):
...     return '!!!%s!!!' % x
...
>>> df.style.format(J, subset=[3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 372, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 841, in _getitem_tuple
    self._has_valid_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 189, in _has_valid_tuple
    if not self._has_valid_type(k, i):
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
    (key, self.obj._get_axis_name(axis)))
KeyError: 'None of [[3]] are in the [columns]'
>>> df.loc[3]
b    5
c    4
Name: 3, dtype: int64
>>> df.loc[[3]]
   b  c
a
3  5  4

OK, I tried using IndexSlice and it seems flaky -- works in some cases, doesn't work in others, at least in Pandas 0.20.3:

Python 2.7.14 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:34:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> idx = pd.IndexSlice
>>> r = np.arange(16).astype(int)
>>> colors = 'red green blue yellow'.split()
>>> df = pd.DataFrame(dict(a=[colors[i] for i in r//4], b=r%4, c=r*100)).set_index(['a','b'])
>>> print df
             c
a      b
red    0     0
       1   100
       2   200
       3   300
green  0   400
       1   500
       2   600
       3   700
blue   0   800
       1   900
       2  1000
       3  1100
yellow 0  1200
       1  1300
       2  1400
       3  1500
>>> df.loc[idx['yellow']]
      c
b
0  1200
1  1300
2  1400
3  1500
>>> def J(x):
...     return '!!!%s!!!' % x
...
>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 372, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1325, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 836, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 948, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1023, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1541, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1081, in _getitem_iterable
    self._has_valid_type(key, axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1418, in _has_valid_type
    (key, self.obj._get_axis_name(axis)))
KeyError: "None of [['yellow']] are in the [columns]"
>>> pd.__version__
u'0.20.3'

In pandas 0.24.2 I get a similar error but slightly different:

>>> df.style.format(J,idx['yellow'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\io\formats\style.py", line 401, in format
    sub_df = self.data.loc[subset]
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1494, in __getitem__
    return self._getitem_tuple(key)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 868, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 969, in _getitem_lowerdim
    return self._getitem_nested_tuple(tup)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1048, in _getitem_nested_tuple
    obj = getattr(obj, self.name)._getitem_axis(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1902, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1205, in _getitem_iterable
    raise_missing=False)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)
  File "c:\app\python\anaconda\2\lib\site-packages\pandas\core\indexing.py", line 1246, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)))
KeyError: u"None of [Index([u'yellow'], dtype='object')] are in the [columns]"
>>> pd.__version__
u'0.24.2'

Oh wait -- I wasn't specifying enough index information; this works:

df.style.format(J,idx['yellow',:])
4
  • I have used this argument to apply formatting to some, but not all, cells in a dataframe. Is there some case where its ultimate behavior is surprising you? For example, a case where calling format(..., subset=s) gives different results than df.loc[s]? I think that would qualify as a bug. Commented Dec 5, 2019 at 20:53
  • added use case that's causing me grief Commented Dec 5, 2019 at 20:57
  • It's says right there that [3] is not your column. What exactly are you trying to do? Commented Dec 5, 2019 at 20:59
  • I'm trying to apply my Formatter to a portion of the data frame, in particular one row, that is selected by .loc[], which is exactly what 3 or [3] does.. Commented Dec 5, 2019 at 21:00

2 Answers 2

1

I agree that the behavior you showed is not ideal.

>>> df = (pandas.DataFrame([dict(a=1,b=2,c=3),
                            dict(a=3,b=5,c=4)])
            .set_index('a'))
>>> df.loc[[3]]
   b  c
a      
3  5  4
>>> df.style.format('{:.2f}', subset=[3])
Traceback (most recent call last)
...
KeyError: "None of [Int64Index([3], dtype='int64')] are in the [columns]"

You can work around this issue by passing a fully-formed pandas.IndexSlice as the subset argument:

>>> df.style.format('{:.2f}', subset=pandas.IndexSlice[[3], :])

Since you asked what _non_reducing_slice() is doing, its goal is reasonable (ensure a subset does not reduce dimensionality to Series). Its implementation treats a list as a sequence of column names:

From pandas/core/indexing.py:

def _non_reducing_slice(slice_):
    """
    Ensurse that a slice doesn't reduce to a Series or Scalar.

    Any user-paseed `subset` should have this called on it
    to make sure we're always working with DataFrames.
    """
    # default to column slice, like DataFrame
    # ['A', 'B'] -> IndexSlices[:, ['A', 'B']]
    kinds = (ABCSeries, np.ndarray, Index, list, str)
    if isinstance(slice_, kinds):
        slice_ = IndexSlice[:, slice_] 
    ...

I wonder if the documentation could be improved: in this case, the exception raised with subset=[3] matches the behavior of df[[3]] rather than df.loc[[3]].

Sign up to request clarification or add additional context in comments.

8 Comments

OK. It would be great if the documentation matched the code. (such an example as you give would be really helpful, especially since selecting one or more rows is a rather common thing)
so.... you answered the question of what I was trying to do, but not the question I asked... what exactly is subset really doing in the code?
But the document does say that subset: IndexSlice.
GAH! How did I miss that? :(
now I just have to figure out what IndexSlice does :/
|
1

It indeed does what it supposed to do.

df = pd.DataFrame(np.arange(16).reshape(4,4))

df.style.background_gradient(subset=[0,1])

df.style.background_gradient()

gives:

enter image description here enter image description here

respectively.

1 Comment

In your case, it does. In general, it does not.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.