Pandas groupby: remove duplicates

Question

input: (CSV file)

name    subject internal_1_marks    internal_2_marks    final_marks
abc     python      45                      50              47
pqr     java        45                      46              46
pqr     python      40                      33              37
xyz     java        45                      43              49
xyz     node        40                      30              35
xyz     ruby        50                      45              47

Expected output: (CSV file)


name    subject internal_1_marks    internal_2_marks    final_marks
abc     python      45                      50              47
pqr     java        45                      46              46
        python      40                      33              37
xyz     java        45                      43              49
        node        40                      30              35
        ruby        50                      45              47

I've tried this:

df = pd.read_csv("student_info.csv")
df.groupby(['name', 'subject']).sum().to_csv("output.csv")

but it's giving duplicate in first column as shown bellow.


name    subject internal_1_marks    internal_2_marks    final_marks
abc     python      45                      50              47
pqr     java        45                      46              46
pqr     python      40                      33              37
xyz     java        45                      43              49
xyz     node        40                      30              35
xyz     ruby        50                      45              47

I need to remove duplicate in first column as shown in expected output.

Thanks.

yes, input file is also csv and expected output is also is in csv file. — codewarrior
– codewarrior, Commented Nov 12, 2020 at 3:51

cfort · Accepted Answer · 2020-11-12 03:54:27Z

3

Similar answer here

mask = df['name'].duplicated()
df.loc[mask.values,['name']] = ''

  name subject  internal_1_marks  internal_2_marks  final_marks
0  abc  python                45                50           47
1  pqr    java                45                46           46
2       python                40                33           37
3  xyz    java                45                43           49
4         node                40                30           35
5         ruby                50                45           47

answered Nov 12, 2020 at 3:54

cfort

2,7951 gold badge22 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kenan · Accepted Answer · 2020-11-12 03:56:06Z

1

You can filter the dupes after the group by

df.groupby(['name', 'subject']).sum().reset_index().assign(name=lambda x: x['name'].where(~x['name'].duplicated(), '')).to_csv('filename.csv', index=False)

Also when reading the file you can pass index_col for the dupes

df = pd.read_csv('test.csv', index_col=[0])

answered Nov 12, 2020 at 3:56

Kenan

14.2k9 gold badges47 silver badges56 bronze badges

Collectives™ on Stack Overflow

Pandas groupby: remove duplicates

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related