How to subset a DataFrame by only a column having multiple entries?

Question

I have a pandas DataFrame df that looks like this:

I wish to subset df by only those rows that have multiple values in column 1, the desired output being:

How do I do this?

related: stackoverflow.com/questions/11528078/…

EdChum
– EdChum

2017-01-23 13:20:34 +00:00
Commented Jan 23, 2017 at 13:20 — EdChum
– EdChum, Commented Jan 23, 2017 at 13:20

jezrael · Accepted Answer · 2017-01-23 13:27:58Z

1

I think you need boolean indexing with mask created by DataFrame.duplicated with keep=False for mark all duplicates as True:

print (df.columns)
Index(['0', '1'], dtype='object')

mask = df.duplicated('1', keep=False)
#another solution with Series.duplicated
#mask = df['1'].duplicated(keep=False)

print (mask)
0     True
1     True
2     True
3    False
4     True
5     True
6    False
dtype: bool

print (df[mask])
    0   1
0  C1  V1
1  C2  V1
2  C3  V1
4  C5  V3
5  C6  V3

print (df.columns)
Int64Index([0, 1], dtype='int64')

mask = df.duplicated(1, keep=False)
#another solution with Series.duplicated
#mask = df[1].duplicated(keep=False)

print (mask)
0     True
1     True
2     True
3    False
4     True
5     True
6    False
dtype: bool

print (df[mask])
    0   1
0  C1  V1
1  C2  V1
2  C3  V1
4  C5  V3
5  C6  V3

edited Jan 23, 2017 at 13:27

answered Jan 23, 2017 at 13:21

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to subset a DataFrame by only a column having multiple entries?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related