1

I try to delete columns with duplicate data in pandas, for example, the following data(They have the same data but different column names):

df1 = pd.DataFrame({'one': [1, 2, 3, 4], 'two': ['a', 'b', 'c', 'd'], 'three': [1, 2, 3, 4]})
   one two  three
0    1   a      1
1    2   b      2
2    3   c      3
3    4   d      4

I hope to get this result:

  one two
0   1   a
1   2   b
2   3   c
3   4   d

The method I use now is:

df2 = df1.T.drop_duplicates().T

But this is too inefficient, is there a better way?

Hope to get your help, thanks

0

1 Answer 1

1

I tried to improve a little efficiency like this:

In [935]: df_int = df1.select_dtypes(include=['int'])
In [933]: df_other = df1.select_dtypes(exclude=['int'])

In [949]: if df_int.T.drop_duplicates().shape[0] == 1:
     ...:     res = pd.concat([df_int.iloc[:,0], df_other], axis=1)
     ...: 

In [950]: res
Out[950]: 
   one two
0    1   a
1    2   b
2    3   c
3    4   d

To remove transpose completely, you can do something like this:

In [995]: import numpy as np
In [997]: if (pd.DataFrame(np.diff(df_int.values)).sum() == 0).all():
     ...:     res = pd.concat([df_int.iloc[:,0], df_other], axis=1)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your help, this can improve efficiency, but the data I need to process is too large, and the transposition takes too much time. If possible, I hope not to use transpose.
I've updated my answer to not have transpose at all. Let me know if this helps you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.