Filtering pandas dataframe based on repeated column values - Python

Question

So, I have a data frame of this type:

               Name   1   2   3   4   5  
               Alex  10  40  20  11  50
               Alex  10  60  20  11  60
                Sam  30  15  50  15  60
                Sam  30  12  50  15  43 
               John  50  18 100   8  32
               John  50  15 100   8  21

I am trying to keep only the columns that have repeated values for all unique row values. For example, in this case, I want to keep columns 1,3,4 because they have repeated values for each 'duplicate' row. But I want to keep the column only if the values are repeated for EACH pair of names - so, the whole column should consist of pairs of same values. Any ideas of how to do that?

Nuri Taş · Accepted Answer · 2022-11-17 11:46:27Z

1

Using a simple list inside agg:

cond = df.groupby('Name').agg(list).applymap(lambda x: len(x) != len(set(x)))

dupe_cols = cond.columns[cond.all()]

answered Nov 17, 2022 at 11:46

Nuri Taş

3,8552 gold badges8 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

willwrighteng · Accepted Answer · 2022-11-17 11:47:36Z

0

this is the easiest way I can think of

from collections import Counter

import pandas as pd

data = [[   'Name',   1,   2,   3,   4,   5],
[   'Alex',  10,  40,  20,  11,  50],
[   'Alex',  10,  60,  20,  11,  60],
[    'Sam',  30,  15,  50,  15,  60],
[    'Sam',  30,  12,  50,  15,  43],
[   'John',  50,  18, 100,   8,  32],
[   'John',  50,  15, 100,   8,  21]]

df = pd.DataFrame(data)

vals = []
for row in range(0,len(df)):
    tmp = Counter(df.iloc[row])
    if 2 not in tmp.values():
        vals.append(row)
        
ndf = df.iloc[vals]
ndf.drop_duplicates(subset='Name',keep='first')

returns


Name    1   2   3   4   5
1   Alex    10  40  20  11  50
4   Sam 30  12  50  15  43
5   John    50  18  100 8   32

edited Nov 17, 2022 at 11:47

answered Nov 17, 2022 at 11:39

willwrighteng

3,1912 gold badges22 silver badges36 bronze badges

2 Comments

mariant Over a year ago

Wait, I wanted to drop columns, not rows; sorry if that was unclear! So, I will keep all the rows with duplicates, I just want to drop the columns where we don't have value duplicates for all pairs

willwrighteng Over a year ago

oops, fixing now -- okay added not to the conditional and dropped duplicates

Collectives™ on Stack Overflow

Filtering pandas dataframe based on repeated column values - Python

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related