3

I am trying to write a pandas/python script the do the following in jupyter notebookssee excel data for example

I need to search column C for each row of data, and look at what number corresponds in that row in col E. I then want it to look for the same number in col G and put the corresponding value it got from E into col I.

If there are multiple instances of a value in col C with different corresponding values in Col E, flag those col C values so I can take a look.

Given col C contains 111 and has code “a” in col E, code “a” would be placed any spot in col I where col G had number 111.

If they do not have same number, Highlight in red those values in col C.

I am having trouble figuring out how to code this up. If anyone can show me that would be greatly appreciated. Thanks

6
  • so, if colC and colG have same number, we need place colE value in col I without highlighting cell, else we need to highlight if C and G have different values. Is that what you are asking for? Commented Dec 23, 2019 at 19:47
  • Yes. so if col c had value 333, where ever there is "333" in col g, put "c" in col I. Since it corresonds to the 333 Commented Dec 23, 2019 at 19:53
  • else put c in col I and highlight it right? Commented Dec 23, 2019 at 19:56
  • Please make an attempt at a solution and come back with specific issues with a reproducible example (not images of sample data because...). Commented Dec 23, 2019 at 21:11
  • @Parfait Said exactly what I was going to. Commented Dec 23, 2019 at 23:31

1 Answer 1

1

Here's what you want..

dct = {'C':[111,222,333,111,444],'E':['a','b','c','d','e'],'G':[111,123,333,111,444]}

df = pd.DataFrame(dct)

highlight = []
vals = []
for i in range(len(df)):
    if df['C'][i] == df['G'][i]:
        highlight.append(False)
        vals.append(df['E'][i])
    else:
        highlight.append(True)
        vals.append(None)

df['I'] = vals

def highlight_cells(x):
    c1 = 'background-color: red'
    c2 = '' 

    df1 =  pd.DataFrame(c2, index=df.index, columns=df.columns)
    #modify values of df1 column by boolean highlight

    df1.loc[highlight, 'C'] = c1 #new styled dataframe

    return df1

df.style.apply(highlight_cells, axis=None).to_excel('styled.xlsx', engine='openpyxl')

Initially prepare highlight list(boolean) i.e which we are marking which rows of colC need to be highlighted. Now we use this highlight list in function highlight_cells, which creates new masked dataframe and it is applied to dataframe df using df.style.apply().

Output:

Output image

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.