Pandas. Drop duplicate rows to another dataframe [duplicate]

Question

here is data example:

import pandas as pd
df = pd.DataFrame({
    'file': ['file1','file2','file1','file2','file3','file3','file4','file5','file4','file5'],
    'prop1': ['True','False','True','False','False','False','False','True','False','False'],
    'prop2': ['False','False','False','False','True','False','True','False','True','False'],
    'prop3': ['False','True','False','True','False','True','False','False','False','True']
})

file    prop1   prop2   prop3
0   file1   True    False   False
1   file2   False   False   True
2   file1   True    False   False
3   file2   False   False   True
4   file3   False   True    False
5   file3   False   False   True
6   file4   False   True    False
7   file5   True    False   False
8   file4   False   True    False
9   file5   False   False   True

I need to drop duplicated rows with same props values to another dataframe and cut them off original file.
So another dataframe should looks like this (duplicated rows should not repeat):

file    prop1   prop2   prop3
0   file1   True    False   False
3   file2   False   False   True
8   file4   False   True    False

df = df.drop_duplicates() drop onlu 1 duplicated row, but not second like this:

    file    prop1   prop2   prop3
0   file1   True    False   False
1   file2   False   False   True
4   file3   False   True    False
5   file3   False   False   True
6   file4   False   True    False
7   file5   True    False   False
9   file5   False   False   True

try: new_df = df.loc[df.duplicated()].copy() to store duplicated values into a new dataframe — Hryhorii Pavlenko
– Hryhorii Pavlenko, Commented Oct 7, 2019 at 14:13
Not sure there's a simple way to get the exact indices you show in your expected output. But would suffice to do df.drop_duplicates(subset=[f'prop{i}' for i in range(1,4)]) — ALollz
– ALollz, Commented Oct 7, 2019 at 14:15
yea drop_duplicated works, but i also need to cut duplicated rows off the dataframe — Contra111
– Contra111, Commented Oct 7, 2019 at 14:16

Arno Maeckelberghe · Accepted Answer · 2019-10-07 14:15:04Z

1

uniques = df.drop_duplicates()
duplicates = df.iloc[list(set(df.index) - set(uniques.index))]

You can use the pandas method drop_duplicates() first to create a dataframe with only the unique rows. You can then compare the indices of your original dataframe and the indices in the frame with unique rows, the 'dropped' indices are your duplicate rows, which you can copy again from your original dataframe in order to now have your unique rows and duplicated rows seperated.

answered Oct 7, 2019 at 14:15

Arno Maeckelberghe

3751 silver badge7 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2019-10-07 14:16:46Z

1

Use DataFrame.drop_duplicates with specify columns names by selecting - all columns without first:

df = df.drop_duplicates(df.columns[1:])

Or seelct columns with prop in columns names:

df = df.drop_duplicates(df.filter(like='prop').columns)

print (df)
    file  prop1  prop2  prop3
0  file1   True  False  False
1  file2  False  False   True
4  file3  False   True  False

answered Oct 7, 2019 at 14:16

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Pandas. Drop duplicate rows to another dataframe [duplicate]

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related