Removing duplicate columns with same column name in pandas

Question

So the problem is the following. I have dataframe:

   a  a  b  a  b
0  0  1  2  1  2
1  3  4  5  4  5

For each column name, I want to remove it's duplicate columns. It is difficult to explain. The resulting dataframe should be:

   a  a  b
0  0  1  2
1  3  4  5

I have achived with drop_duplicates() with the transpose of df[['column_namee']] for each column, but its too slow.

I am wondering if there is any fastest way to solve it.

Somewhat related: do you have to use duplicate column names? That needlessly complicates subsequent analysis. — Peter Leimbigler
– Peter Leimbigler, Commented Mar 4, 2020 at 15:49
Yes. the column names are in real string dates ('2020-02-03') and I want to remove the the duplicate columns of the same date. Thats the reason why I want duplicate columns — rgralma
– rgralma, Commented Mar 5, 2020 at 10:04

BENY · Accepted Answer · 2020-03-04 15:51:14Z

2

IIUC

df=df.loc[:,~(df.T.duplicated()&df.columns.duplicated())]
Out[184]: 
   a  a  b
0  0  1  2
1  3  4  5

answered Mar 4, 2020 at 15:51

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Thats actually no working if a and b have the same columns value