0

So the problem is the following. I have dataframe:

   a  a  b  a  b
0  0  1  2  1  2
1  3  4  5  4  5

For each column name, I want to remove it's duplicate columns. It is difficult to explain. The resulting dataframe should be:

   a  a  b
0  0  1  2
1  3  4  5

I have achived with drop_duplicates() with the transpose of df[['column_namee']] for each column, but its too slow.

I am wondering if there is any fastest way to solve it.

5
  • Somewhat related: do you have to use duplicate column names? That needlessly complicates subsequent analysis. Commented Mar 4, 2020 at 15:49
  • Check out kalu's answer here: stackoverflow.com/a/32961145 Commented Mar 4, 2020 at 15:52
  • Why do you have duplicate columns? Commented Mar 4, 2020 at 15:59
  • your resulting dataframe still has two columns named a. Commented Mar 4, 2020 at 16:32
  • Yes. the column names are in real string dates ('2020-02-03') and I want to remove the the duplicate columns of the same date. Thats the reason why I want duplicate columns Commented Mar 5, 2020 at 10:04

1 Answer 1

2

IIUC

df=df.loc[:,~(df.T.duplicated()&df.columns.duplicated())]
Out[184]: 
   a  a  b
0  0  1  2
1  3  4  5
Sign up to request clarification or add additional context in comments.

1 Comment

Thats actually no working if a and b have the same columns value

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.