0

I have a data frame df and would like to reassign value from columns b to the last columns. The logic is as follows: if "b" column value is greater or equal to the previous row of "a" column value, reassign "b" value as "green", otherwise "red". My code raise an indexing error: Too many indexers. Have no idea what's wrong with my code. Any help would be appreciated.

value = [[10, 95, 10, 32],[22, 12, 3, 15],[28, 25, 5, 29],[30, 11, 66, 16]]
df = pd.DataFrame(value, columns=['a', 'b', 'c', 'd'])

for j in range(2, len(df.columns)):
    df.iloc[:,j] = df.apply(lambda x: "green" if x.iloc[:,j] >= (x["b"].shift(periods = 1)) else "red", axis = 1)

The expected result is:

 a     b     c     d

10   nan   nan   nan
22  green  red   green
28  green  red   green
30  red   green  red
6
  • I'm a bit unclear on the logic. Are all columns compared to a? if so wouldn't the last row of C be green as 66 >= 28? Can you elaborate on why the expected values are expected? Commented Jul 11, 2021 at 4:56
  • Specifically the issue is that x is a Series and only has a single dimension. So it would be something like x.iloc[j] but then you'll have other errors raised. Commented Jul 11, 2021 at 4:58
  • Sorry, it was a typo error. I edited it. Commented Jul 11, 2021 at 5:06
  • Thank you Henry. I don't understand why x here is a Series. I applied lambda function on df, isn't x referring to the data frame df? Commented Jul 11, 2021 at 5:16
  • So x is referring to a row (axis=1) of the DataFrame. Which is a single dimension (Series), not two-dimensional data like a DataFrame. Commented Jul 11, 2021 at 5:21

1 Answer 1

1

Let's try with np.where and compare where the columns are Series.ge than the shift of a:

import numpy as np

df.iloc[:, 1:] = np.where(df.iloc[:, 1:].ge(df['a'].shift(), axis=0),
                          'green',
                          'red')
df.iloc[0, 1:] = np.nan

df:

    a      b      c      d
0  10    NaN    NaN    NaN
1  22  green    red  green
2  28  green    red  green
3  30    red  green    red

The second assignment to put the nans back in the first row is necessary as nan >= value is False so the first row will end up all red.


It may be beneficial to add the NaN values back based on where the the shifted series is NaN (this allows for different types of shifting, but will not change the output from the above)

s = df['a'].shift()
df.iloc[:, 1:] = np.where(df.iloc[:, 1:].ge(s, axis=0), 'green', 'red')
df.iloc[s.isna(), 1:] = np.nan

A (slower) option with apply + map:

s = df['a'].shift()
df.iloc[:, 1:] = df.iloc[:, 1:].apply(
    lambda x: x.ge(s).map({True: 'green', False: 'red'})
)
df.iloc[s.isna(), 1:] = np.nan

Or apply + np.where:

s = df['a'].shift()
df.iloc[:, 1:] = df.iloc[:, 1:].apply(
    lambda x: np.where(x.ge(s), 'green', 'red')
)
df.iloc[s.isna(), 1:] = np.nan
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.