Python Dataframe Duplicated Columns while Merging multple times

Question

I have a main dataframe and a sub dataframe. I want to merge each column in sub dataframe into main dataframe with main dataframe column as a reference. I have successfully arrived at my desired answer, except that I see duplicated columns of the main dataframe. Below are the my expected and present answers.

Present solution:

df = pd.DataFrame({'Ref':[1,2,3,4]})
df1 = pd.DataFrame({'A':[2,3],'Z':[1,2]})

df = [df.merge(df1[col_name],left_on='Ref',right_on=col_name,how='left') for col_name in df1.columns]
df = pd.concat(df,axis=1)
df = 
   Ref    A   Ref    Z
0    1  NaN     1  1.0
1    2  2.0     2  2.0
2    3  3.0     3  NaN
3    4  NaN     4  NaN

Expected Answer:

df = 
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN

Corralien · Accepted Answer · 2022-04-28 20:02:21Z

2

Update

Use duplicated:

>>> df.loc[:, ~df.columns.duplicated()]
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN

Old answer

You can use:

# Your code
...
df = pd.concat(df, axis=1)

# Use pop and insert to cleanup your dataframe
df.insert(0, 'Ref', df.pop('Ref').iloc[:, 0])

Output:

>>> df
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN

edited Apr 28, 2022 at 20:02

answered Apr 28, 2022 at 19:58

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mainland Over a year ago

Nice one. The answer here deletes the multiple and keeps the first one. correct? I thought about it but have one new line of code. Do you think it can be done in existing code by tweaking it?

Corralien Over a year ago

Yes you are correct. The option with set_index is a good alternative. Check the answer of @AndreaIerardi. I updated my answer with a simpler method

Corralien Over a year ago

You can also replace your whole code by:

df.merge(df1.melt(), left_on='Ref', right_on='value', how='left').pivot_table('value', 'Ref', 'variable').reindex(df['Ref']).rename_axis(columns=None).reset_index()

Andrea Ierardi · Accepted Answer · 2022-04-28 19:55:53Z

1

What about setting 'Ref' col as index while getting dataframe list. (And resetting index such that you get back Ref as a column)

df = pd.DataFrame({'Ref':[1,2,3,4]}) 
df1 = pd.DataFrame({'A':[2,3],'Z':[1,2]})


df = [df.merge(df1[col_name],left_on='Ref',right_on=col_name,how='left').set_index('Ref') for col_name in df1.columns]
df = pd.concat(df,axis=1)

df = df.reset_index()

        Ref  A   Z
        1   NaN 1.0
        2   2.0 2.0 
        3   3.0 NaN 
        4   NaN NaN

answered Apr 28, 2022 at 19:55

Andrea Ierardi

4293 silver badges10 bronze badges

Comments

Onyambu · Accepted Answer · 2022-04-28 19:56:58Z

1

This is a reduction process. Instead of the list comprehension use for - loop, or even reduce:

from functools import reduce

reduce(lambda x, y : x.merge(df1[y],left_on='Ref',right_on=y,how='left'), df1.columns, df)

   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN

The above is similar to:

for y in df1.columns:
    df = df.merge(df1[y],left_on='Ref',right_on=y,how='left')
df
 
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN

answered Apr 28, 2022 at 19:56

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

Collectives™ on Stack Overflow

Python Dataframe Duplicated Columns while Merging multple times

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related