Highlight a column value based off another column value in pandas

Question

I have a function like this:

def highlight_otls(df):
    return ['background-color: yellow']

And a DataFrame like this:

price   outlier 
1.99       F,C
1.49       L,C
1.99         F
1.39         N

What I want to do is highlight a certain column in my df based off of this condition of another column:

data['outlier'].str.split(',').str.len() >= 2

So if the column values df['outlier'] >= 2, I want to highlight the corresponding column df['price']. (So the first 2 prices should be highlighted in my dataframe above).

I attempted to do this by doing the following which gives me an error:

data['price'].apply(lambda x: highlight_otls(x) if (x['outlier'].str.split(',').str.len()) >= 2, axis=1)

Any idea on how to do this the proper way?

DieterDP · Accepted Answer · 2020-08-12 09:43:17Z

10

+50

Use Styler.apply. (To output to xlsx format, use to_excel function.)

Suppose one's dataset is

other   price   outlier
0   X   1.99    F,C
1   X   1.49    L,C
2   X   1.99    F
3   X   1.39    N

def hightlight_price(row):
    ret = ["" for _ in row.index]
    if len(row.outlier.split(",")) >= 2:
        ret[row.index.get_loc("price")] = "background-color: yellow"
    return ret
       
df.style.\
    apply(hightlight_price, axis=1).\
    to_excel('styled.xlsx', engine='openpyxl')

From the documentation, "DataFrame.style attribute is a property that returns a Styler object."

We pass our styling function, hightlight_price, into Styler.apply and demand a row-wise nature of the function with axis=1. (Recall that we want to color the price cell in each row based on the outlier information in the same row.)

Our function hightlight_price will generate the visual styling for each row. For each row row, we first generate styling for other, price, and outlier column to be ["", "", ""]. We can obtain the right index to modify only the price part in the list with row.index.get_loc("price") as in

ret[row.index.get_loc("price")] = "background-color: yellow"
# ret becomes ["", "background-color: yellow", ""]

Results

edited Aug 12, 2020 at 9:43

DieterDP

4,4672 gold badges35 silver badges41 bronze badges

answered Jun 13, 2018 at 3:15

Tai

8,0643 gold badges31 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Hana Over a year ago

thanks for your answer! I'm getting the error: 'AttributeError: ("'float' object has no attribute 'split'", 'occurred at index 0')' although the 'outlier' column is a str with some NaN values, any idea how to fix that?

Hana Over a year ago

can you also specify for the highlighting format to apply to the 'price' column? in my actual dataframe, there are other columns I do not want highlighted. Thanks!

Tai Over a year ago

@Hana For your first problem, I think that is because you have some missing data like NaN in the outlier column. Can you check it for me with df.outlier.isna().any()?

Tai Over a year ago

@Hana for you second problem, please see the update.

Tai Over a year ago

@Hana if your first problem is caused by something like NaN, use df.outlier.fillna("", inplace=True) to fix it.

twolffpiggott · Accepted Answer · 2018-06-06 16:08:47Z

2

Key points

You need to access values in the multiple columns for your lambda function, so apply to the whole dataframe instead of the price column only.
The above also solves the issue that apply for a series has no axis argument.
Add else x to fix the syntax error in the conditional logic for your lambda
When you index x in the lambda it is a value, no longer a series, so kill the str attribute calls and just call len on it.

So try:

data.apply(lambda x: highlight_otls(x) if len(x['outlier'].split(',')) >= 2 else x, axis=1)

Output

0    [background-color: yellow]
1    [background-color: yellow]
2                  [None, None]
3                  [None, None]
dtype: object

One way to deal with null outlier values as per your comment is to refactor the highlighting conditional logic into the highlight_otls function:

def highlight_otls(x):                                                            
     if len(x['outlier'].split(',')) >= 2:
         return ['background-color: yellow']
     else:
         return x

data.apply(lambda x: highlight_otls(x) if pd.notnull(x['outlier']) else x, axis=1)

By the way, you may want to return something like ['background-color: white'] instead of x when you don't want to apply highlighting.

edited Jun 6, 2018 at 16:08

answered Jun 6, 2018 at 15:49

twolffpiggott

1,1039 silver badges14 bronze badges

2 Comments

Hana Over a year ago

Thanks for the clarifications! How can I add pd.notnull() to x['outlier']? There are some instances where there are no outliers in my full dataframe, so I'm getting this error - AttributeError: ("'float' object has no attribute 'str'", 'occurred at index 0')

Hana Over a year ago

So actually when I output my the excel file with this code, I don't see any yellow highlighted fields, any idea why?

jezrael · Accepted Answer · 2018-06-13 10:50:49Z

2

I suggest use custom function for return styled DataFrame by condition, last export Excel file:

def highlight_otls(x):
    c1 = 'background-color: yellow'
    c2 = '' 

    mask = x['outlier'].str.split(',').str.len() >= 2
    df1 =  pd.DataFrame(c2, index=df.index, columns=df.columns)
    #modify values of df1 column by boolean mask
    df1.loc[mask, 'price'] = c1

    #check styled DataFrame
    print (df1)

                          price outlier
    0  background-color: yellow        
    1  background-color: yellow        
    2                                  
    3                               
    return df1

df.style.apply(highlight_otls, axis=None).to_excel('styled.xlsx', engine='openpyxl')

edited Jun 13, 2018 at 10:50

answered Jun 13, 2018 at 10:45

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

Hana Over a year ago

Hey this is giving me error: 'TypeError: Function <function highlight_otls at 0x1a274f60d0> must return a DataFrame when passed to Styler.apply with axis=None' any idea how to fix it?

Collectives™ on Stack Overflow

Highlight a column value based off another column value in pandas

3 Answers 3

5 Comments

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related