6

If I have a dataframe like below:

|  Column A  |  Column B  |  Column C  |  Column D  |  Column E  |
|:-----------|:---------- |:-----------|:-----------|:-----------|
| 1          | 7          | 1          | 13         | 13         |
| 2          | 8          | 2          | 14         | 13         |
| 3          | 9          | 3          | 15         | 13         |
| 4          | 10         | 4          | 16         | 13         |
| NA         | 11         | NA         | 17         | 13         |
| 6          | 12         | 6          | 1          | 13         |

I'd like to remove the duplicate columns A (or C), ignoring the fact that Column E has duplicate rows, and ignoring the column headers.

1
  • This has been marked as a duplicate question. It's not a duplicate of the other question. The other question related to column names. This relates to row values. Commented Sep 18, 2022 at 13:01

2 Answers 2

13

You can transpose and then transpose back:

df.T.drop_duplicates().T
Sign up to request clarification or add additional context in comments.

Comments

10

You can do that with DataFrame.duplicated, use keep in order to keep the first or last duplicated columns:

df.loc[:,~df.T.duplicated(keep='first')]

    Column A  Column B  Column D  Column E
0      1.0        7       13       13
1      2.0        8       14       13
2      3.0        9       15       13
3      4.0       10       16       13
4      NaN       11       17       13
5      6.0       12        1       13

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.