Conditional formatting on Duplicates using pandas

Question

I have a dataFrame with 6 columns. And I want to do conditional formatting on two columns of them. So my dataFrame looks like this

And I want to highlight duplicate values from College and College_F2 columns. After which my dataframe will look like this

The code written for this looks like this:

dataFrame_file = pd.read_excel(util.comcastFile2Path(), sheet_name='Sheet1')

def dummy_color(x):
    c1 = 'background-color:red'
    c2 = ''
    cond = dataFrame_file.stack().duplicated(keep=False).unstack()
    df1 = pd.DataFrame(np.where(cond, c1, c2), columns=x.columns, index=x.index)
    return df1

dataFrame_file.style.apply(dummy_color,axis=None,subset=['College', 'College_F2']).to_excel(util.comcastFile2Path)

And the error this code is giving to me is

ValueError: Shape of passed values is (6, 6), indices imply (6, 2)

IDE that I am using is PyCharm. How to resolve this issue?

Thanks in advance.

Please provide (at least your initial) dataframe via df.head().to_dict() — rpanai
– rpanai, Commented Jan 12, 2020 at 4:20
@rpanai dataFrame_file = pd.read_excel(util.comcastFile2Path(), sheet_name='Sheet1') This is how I am providing the dataFrame. — Twinkle Lahariya
– Twinkle Lahariya, Commented Jan 12, 2020 at 4:43

jezrael · Accepted Answer · 2020-01-12 06:44:10Z

2

In solution is necessary working with all DataFrame, so omited subset parameter and in cond filtered columns for check duplicates and also added DataFrame.reindex for filling False to all another columns:

def dummy_color(x):
    c1 = 'background-color:red'
    c2 = ''
    cond = (x[['College', 'College_F2']].stack()
                                        .duplicated(keep=False)
                                        .unstack()
                                        .reindex(x.columns, axis=1, fill_value=False))
    df1 = pd.DataFrame(np.where(cond, c1, c2), columns=x.columns, index=x.index)
    return df1

dataFrame_file.style.apply(dummy_color,axis=None).to_excel(util.comcastFile2Path)

Like @anky_91 mentioned simplier is use subset parameter with x for cond variable, I think reason is x variable are only columns filtered by subset list:

def dummy_color(x):
    c1 = 'background-color:red'
    c2 = ''
    cond = x.stack().duplicated(keep=False).unstack()
    df1 = pd.DataFrame(np.where(cond, c1, c2), columns=x.columns, index=x.index)
    return df1

dataFrame_file.style.apply(dummy_color,axis=None, subset=['College', 'College_F2']).to_excel(util.comcastFile2Path)

edited Jan 12, 2020 at 6:44

answered Jan 12, 2020 at 6:24

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

anky Over a year ago

interesting , wonder why the solution didnot work , only except the cond line where dataFrame_file need to be replaced by x and it works for me , but this is a nice alternative :)

jlewkovich · Accepted Answer · 2020-01-12 06:20:48Z

0

def dummy_color(x):
    color = 'red' if (len(dataFrame_file[dataFrame_file['College'] == x]) + len(dataFrame_file[dataFrame_file['College_F2'] == x])) > 1 else ''
    return 'background-color: %s' % color

dataFrame_file.style.applymap(dummy_color, subset=['College', 'College_F2']).to_excel(util.comcastFile2Path)

answered Jan 12, 2020 at 6:20

jlewkovich

2,7752 gold badges36 silver badges50 bronze badges

Collectives™ on Stack Overflow

Conditional formatting on Duplicates using pandas

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related