Pandas lambda function raised an indexing error

Question

I have a data frame df and would like to reassign value from columns b to the last columns. The logic is as follows: if "b" column value is greater or equal to the previous row of "a" column value, reassign "b" value as "green", otherwise "red". My code raise an indexing error: Too many indexers. Have no idea what's wrong with my code. Any help would be appreciated.

value = [[10, 95, 10, 32],[22, 12, 3, 15],[28, 25, 5, 29],[30, 11, 66, 16]]
df = pd.DataFrame(value, columns=['a', 'b', 'c', 'd'])

for j in range(2, len(df.columns)):
    df.iloc[:,j] = df.apply(lambda x: "green" if x.iloc[:,j] >= (x["b"].shift(periods = 1)) else "red", axis = 1)

The expected result is:

 a     b     c     d

10   nan   nan   nan
22  green  red   green
28  green  red   green
30  red   green  red

I'm a bit unclear on the logic. Are all columns compared to a? if so wouldn't the last row of C be green as 66 >= 28? Can you elaborate on why the expected values are expected? — Henry Ecker
– Henry Ecker ♦, Commented Jul 11, 2021 at 4:56
Specifically the issue is that x is a Series and only has a single dimension. So it would be something like x.iloc[j] but then you'll have other errors raised. — Henry Ecker
– Henry Ecker ♦, Commented Jul 11, 2021 at 4:58
Thank you Henry. I don't understand why x here is a Series. I applied lambda function on df, isn't x referring to the data frame df? — M D
– M D, Commented Jul 11, 2021 at 5:16
So x is referring to a row (axis=1) of the DataFrame. Which is a single dimension (Series), not two-dimensional data like a DataFrame. — Henry Ecker
– Henry Ecker ♦, Commented Jul 11, 2021 at 5:21

Henry Ecker · Accepted Answer · 2021-07-11 05:20:25Z

Let's try with np.where and compare where the columns are Series.ge than the shift of a:

import numpy as np

df.iloc[:, 1:] = np.where(df.iloc[:, 1:].ge(df['a'].shift(), axis=0),
                          'green',
                          'red')
df.iloc[0, 1:] = np.nan

df:

    a      b      c      d
0  10    NaN    NaN    NaN
1  22  green    red  green
2  28  green    red  green
3  30    red  green    red

The second assignment to put the nans back in the first row is necessary as nan >= value is False so the first row will end up all red.

It may be beneficial to add the NaN values back based on where the the shifted series is NaN (this allows for different types of shifting, but will not change the output from the above)

s = df['a'].shift()
df.iloc[:, 1:] = np.where(df.iloc[:, 1:].ge(s, axis=0), 'green', 'red')
df.iloc[s.isna(), 1:] = np.nan

A (slower) option with apply + map:

s = df['a'].shift()
df.iloc[:, 1:] = df.iloc[:, 1:].apply(
    lambda x: x.ge(s).map({True: 'green', False: 'red'})
)
df.iloc[s.isna(), 1:] = np.nan

Or apply + np.where:

s = df['a'].shift()
df.iloc[:, 1:] = df.iloc[:, 1:].apply(
    lambda x: np.where(x.ge(s), 'green', 'red')
)
df.iloc[s.isna(), 1:] = np.nan

Collectives™ on Stack Overflow

Pandas lambda function raised an indexing error

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related