2

I want to find matches between two columns and highlight the results of both columns, in a pandas dataframes, so I did this code:

df=pd.DataFrame({'Name':['Zea mays','Zea mays subsp. mexicana','Zea mays subsp. parviglumis'],'ID':[1,2,3],'type':[1.1,1.2,1.3],
                 'Name.1':['Zea mays subsp. huehuetenangensis','Zea mays subsp. mays','Zea mays'],'ID.1':[1,2,3],'type.1':[1.1,1.2,1.3],
                 'Name.2':['Zea nicaraguensis','Zea luxurians','Zea perennis'],'ID.2':[1,2,3],'type.2':[1.1,1.2,1.3],
                 'Name.3':['Capsicum annuum','Capsicum frutescens','Capsicum chinense'],'ID.3':[1,2,3],'type.3':[1.1,1.2,1.3]})

def in_statements(s):
    color = 'yellow'
    if np.where(str(s.iloc[4]) == str(s.iloc[8])):
        color = 'yellow'
    else:
        color = 'black'
    return 'background-color: %s' % color
df.style.applymap(in_statements)

However, it gives me this error: " ("'str' object has no attribute 'iloc'", 'occurred at index Samples')"

This is an example of the input:

enter image description here

This is how it should lool: enter image description here

Can someone point me in the right direction? Thanks

2
  • provide an example of the input dataframe, along with ur expected output. Commented Feb 5, 2020 at 11:39
  • I edited my question to have the desired input Commented Feb 5, 2020 at 11:55

1 Answer 1

1

If want highlight duplicated values in Names columns use Styler.apply with mask creted by DataFrame.filter for columns with Name, reshaped by DataFrame.stack, get duplicates by Series.duplicated and reshaped back by Series.unstack:

df=pd.DataFrame({'Name':['Dog','Dog.1','Dog.3'],'ID':[1,2,3],'type':[1.1,1.2,1.3],
                 'Name.1':['Dog','cat','Dog.3'],'ID.1':[1,2,3],'type.1':[1.1,1.2,1.3],
                 'Name.2':['cat','cat.12','Dog.1'],'ID.2':[1,2,3],'type.2':[1.1,1.2,1.3],
                 'Name.3':['cat.7','cat.13','Dog.3'],'ID.3':[1,2,3],'type.3':[1.1,1.2,1.3]})

print (df)
    Name  ID  type Name.1  ID.1  type.1  Name.2  ID.2  type.2  Name.3  ID.3  \
0    Dog   1   1.1    Dog     1     1.1     cat     1     1.1   cat.7     1   
1  Dog.1   2   1.2    cat     2     1.2  cat.12     2     1.2  cat.13     2   
2  Dog.3   3   1.3  Dog.3     3     1.3   Dog.1     3     1.3   Dog.3     3   

   type.3  
0     1.1  
1     1.2  
2     1.3  

def color(x):
    c1 = 'background-color: yellow'
    c = '' 
    #compare columns
    names = x.filter(like='Name')
    mask = names.stack().duplicated(keep=False).unstack()
    #DataFrame with same index and columns names as original filled empty strings
    df1 =  pd.DataFrame(c, index=x.index, columns=x.columns)
    #modify values of df1 column by boolean mask
    df1 = names.mask(mask, c1).reindex(df1.columns, axis=1, fill_value='')
    return df1


df.style.apply(color, axis=None).to_excel('df.xlsx', engine='openpyxl', index=False)
Sign up to request clarification or add additional context in comments.

7 Comments

Can you also write the output to an excell file? Because when I try to do it, I get this error: "Result of must have identical index and columns as the input"
@Sofia - I try understand green highlight values, but failed. Can you explain it? Also Please don't post images of code/data (or links to them)
So dog is repeated in the same three columns, and then cat is repeated as well. Thats wayt it is repeated... Basically the green color matched the same word repeated in the other columns.
@jezrael- I just added the dataframe in pandas, thanks for the help :)
I edited the pandas dataframe, to show you my real situation. Cause I get this error: bad operand type for unary ~: 'float'. Can you see if you get this error? I'm so sorry for the trouble.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.