14

I have a function like this:

def highlight_otls(df):
    return ['background-color: yellow']

And a DataFrame like this:

price   outlier 
1.99       F,C
1.49       L,C
1.99         F
1.39         N

What I want to do is highlight a certain column in my df based off of this condition of another column:

data['outlier'].str.split(',').str.len() >= 2

So if the column values df['outlier'] >= 2, I want to highlight the corresponding column df['price']. (So the first 2 prices should be highlighted in my dataframe above).

I attempted to do this by doing the following which gives me an error:

data['price'].apply(lambda x: highlight_otls(x) if (x['outlier'].str.split(',').str.len()) >= 2, axis=1)

Any idea on how to do this the proper way?

3 Answers 3

10
+50

Use Styler.apply. (To output to xlsx format, use to_excel function.)

Suppose one's dataset is

other   price   outlier
0   X   1.99    F,C
1   X   1.49    L,C
2   X   1.99    F
3   X   1.39    N

def hightlight_price(row):
    ret = ["" for _ in row.index]
    if len(row.outlier.split(",")) >= 2:
        ret[row.index.get_loc("price")] = "background-color: yellow"
    return ret
       
df.style.\
    apply(hightlight_price, axis=1).\
    to_excel('styled.xlsx', engine='openpyxl')

From the documentation, "DataFrame.style attribute is a property that returns a Styler object."

We pass our styling function, hightlight_price, into Styler.apply and demand a row-wise nature of the function with axis=1. (Recall that we want to color the price cell in each row based on the outlier information in the same row.)

Our function hightlight_price will generate the visual styling for each row. For each row row, we first generate styling for other, price, and outlier column to be ["", "", ""]. We can obtain the right index to modify only the price part in the list with row.index.get_loc("price") as in

ret[row.index.get_loc("price")] = "background-color: yellow"
# ret becomes ["", "background-color: yellow", ""]

Results

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

thanks for your answer! I'm getting the error: 'AttributeError: ("'float' object has no attribute 'split'", 'occurred at index 0')' although the 'outlier' column is a str with some NaN values, any idea how to fix that?
can you also specify for the highlighting format to apply to the 'price' column? in my actual dataframe, there are other columns I do not want highlighted. Thanks!
@Hana For your first problem, I think that is because you have some missing data like NaN in the outlier column. Can you check it for me with df.outlier.isna().any()?
@Hana for you second problem, please see the update.
@Hana if your first problem is caused by something like NaN, use df.outlier.fillna("", inplace=True) to fix it.
2

Key points

  1. You need to access values in the multiple columns for your lambda function, so apply to the whole dataframe instead of the price column only.
  2. The above also solves the issue that apply for a series has no axis argument.
  3. Add else x to fix the syntax error in the conditional logic for your lambda
  4. When you index x in the lambda it is a value, no longer a series, so kill the str attribute calls and just call len on it.

So try:

data.apply(lambda x: highlight_otls(x) if len(x['outlier'].split(',')) >= 2 else x, axis=1)

Output

0    [background-color: yellow]
1    [background-color: yellow]
2                  [None, None]
3                  [None, None]
dtype: object

One way to deal with null outlier values as per your comment is to refactor the highlighting conditional logic into the highlight_otls function:

def highlight_otls(x):                                                            
     if len(x['outlier'].split(',')) >= 2:
         return ['background-color: yellow']
     else:
         return x

data.apply(lambda x: highlight_otls(x) if pd.notnull(x['outlier']) else x, axis=1)

By the way, you may want to return something like ['background-color: white'] instead of x when you don't want to apply highlighting.

2 Comments

Thanks for the clarifications! How can I add pd.notnull() to x['outlier']? There are some instances where there are no outliers in my full dataframe, so I'm getting this error - AttributeError: ("'float' object has no attribute 'str'", 'occurred at index 0')
So actually when I output my the excel file with this code, I don't see any yellow highlighted fields, any idea why?
2

I suggest use custom function for return styled DataFrame by condition, last export Excel file:

def highlight_otls(x):
    c1 = 'background-color: yellow'
    c2 = '' 

    mask = x['outlier'].str.split(',').str.len() >= 2
    df1 =  pd.DataFrame(c2, index=df.index, columns=df.columns)
    #modify values of df1 column by boolean mask
    df1.loc[mask, 'price'] = c1

    #check styled DataFrame
    print (df1)

                          price outlier
    0  background-color: yellow        
    1  background-color: yellow        
    2                                  
    3                               
    return df1

df.style.apply(highlight_otls, axis=None).to_excel('styled.xlsx', engine='openpyxl')

pic

1 Comment

Hey this is giving me error: 'TypeError: Function <function highlight_otls at 0x1a274f60d0> must return a DataFrame when passed to Styler.apply with axis=None' any idea how to fix it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.