1

input: (CSV file)

name    subject internal_1_marks    internal_2_marks    final_marks
abc     python      45                      50              47
pqr     java        45                      46              46
pqr     python      40                      33              37
xyz     java        45                      43              49
xyz     node        40                      30              35
xyz     ruby        50                      45              47

Expected output: (CSV file)


name    subject internal_1_marks    internal_2_marks    final_marks
abc     python      45                      50              47
pqr     java        45                      46              46
        python      40                      33              37
xyz     java        45                      43              49
        node        40                      30              35
        ruby        50                      45              47

I've tried this:

df = pd.read_csv("student_info.csv")
df.groupby(['name', 'subject']).sum().to_csv("output.csv")

but it's giving duplicate in first column as shown bellow.


name    subject internal_1_marks    internal_2_marks    final_marks
abc     python      45                      50              47
pqr     java        45                      46              46
pqr     python      40                      33              37
xyz     java        45                      43              49
xyz     node        40                      30              35
xyz     ruby        50                      45              47

I need to remove duplicate in first column as shown in expected output.

Thanks.

2
  • do you want blanks "" in the csv file? Commented Nov 12, 2020 at 3:50
  • yes, input file is also csv and expected output is also is in csv file. Commented Nov 12, 2020 at 3:51

2 Answers 2

3

Similar answer here

mask = df['name'].duplicated()
df.loc[mask.values,['name']] = ''

  name subject  internal_1_marks  internal_2_marks  final_marks
0  abc  python                45                50           47
1  pqr    java                45                46           46
2       python                40                33           37
3  xyz    java                45                43           49
4         node                40                30           35
5         ruby                50                45           47
Sign up to request clarification or add additional context in comments.

Comments

1

You can filter the dupes after the group by

df.groupby(['name', 'subject']).sum().reset_index().assign(name=lambda x: x['name'].where(~x['name'].duplicated(), '')).to_csv('filename.csv', index=False)

Also when reading the file you can pass index_col for the dupes

df = pd.read_csv('test.csv', index_col=[0])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.