4

I have a dataFrame with 6 columns. And I want to do conditional formatting on two columns of them. So my dataFrame looks like this

enter image description here

And I want to highlight duplicate values from College and College_F2 columns. After which my dataframe will look like this

enter image description here

The code written for this looks like this:
dataFrame_file = pd.read_excel(util.comcastFile2Path(), sheet_name='Sheet1')

def dummy_color(x):
    c1 = 'background-color:red'
    c2 = ''
    cond = dataFrame_file.stack().duplicated(keep=False).unstack()
    df1 = pd.DataFrame(np.where(cond, c1, c2), columns=x.columns, index=x.index)
    return df1

dataFrame_file.style.apply(dummy_color,axis=None,subset=['College', 'College_F2']).to_excel(util.comcastFile2Path)

And the error this code is giving to me is

ValueError: Shape of passed values is (6, 6), indices imply (6, 2)

IDE that I am using is PyCharm. How to resolve this issue?

Thanks in advance.

2
  • Please provide (at least your initial) dataframe via df.head().to_dict() Commented Jan 12, 2020 at 4:20
  • @rpanai dataFrame_file = pd.read_excel(util.comcastFile2Path(), sheet_name='Sheet1') This is how I am providing the dataFrame. Commented Jan 12, 2020 at 4:43

2 Answers 2

2

In solution is necessary working with all DataFrame, so omited subset parameter and in cond filtered columns for check duplicates and also added DataFrame.reindex for filling False to all another columns:

def dummy_color(x):
    c1 = 'background-color:red'
    c2 = ''
    cond = (x[['College', 'College_F2']].stack()
                                        .duplicated(keep=False)
                                        .unstack()
                                        .reindex(x.columns, axis=1, fill_value=False))
    df1 = pd.DataFrame(np.where(cond, c1, c2), columns=x.columns, index=x.index)
    return df1

dataFrame_file.style.apply(dummy_color,axis=None).to_excel(util.comcastFile2Path)

Like @anky_91 mentioned simplier is use subset parameter with x for cond variable, I think reason is x variable are only columns filtered by subset list:

def dummy_color(x):
    c1 = 'background-color:red'
    c2 = ''
    cond = x.stack().duplicated(keep=False).unstack()
    df1 = pd.DataFrame(np.where(cond, c1, c2), columns=x.columns, index=x.index)
    return df1

dataFrame_file.style.apply(dummy_color,axis=None, subset=['College', 'College_F2']).to_excel(util.comcastFile2Path)
Sign up to request clarification or add additional context in comments.

1 Comment

interesting , wonder why the solution didnot work , only except the cond line where dataFrame_file need to be replaced by x and it works for me , but this is a nice alternative :)
0
def dummy_color(x):
    color = 'red' if (len(dataFrame_file[dataFrame_file['College'] == x]) + len(dataFrame_file[dataFrame_file['College_F2'] == x])) > 1 else ''
    return 'background-color: %s' % color

dataFrame_file.style.applymap(dummy_color, subset=['College', 'College_F2']).to_excel(util.comcastFile2Path)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.