0

I have a main dataframe and a sub dataframe. I want to merge each column in sub dataframe into main dataframe with main dataframe column as a reference. I have successfully arrived at my desired answer, except that I see duplicated columns of the main dataframe. Below are the my expected and present answers.

Present solution:

df = pd.DataFrame({'Ref':[1,2,3,4]})
df1 = pd.DataFrame({'A':[2,3],'Z':[1,2]})

df = [df.merge(df1[col_name],left_on='Ref',right_on=col_name,how='left') for col_name in df1.columns]
df = pd.concat(df,axis=1)
df = 
   Ref    A   Ref    Z
0    1  NaN     1  1.0
1    2  2.0     2  2.0
2    3  3.0     3  NaN
3    4  NaN     4  NaN

Expected Answer:

df = 
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN

3 Answers 3

2

Update

Use duplicated:

>>> df.loc[:, ~df.columns.duplicated()]
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN

Old answer

You can use:

# Your code
...
df = pd.concat(df, axis=1)

# Use pop and insert to cleanup your dataframe
df.insert(0, 'Ref', df.pop('Ref').iloc[:, 0])

Output:

>>> df
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN
Sign up to request clarification or add additional context in comments.

3 Comments

Nice one. The answer here deletes the multiple and keeps the first one. correct? I thought about it but have one new line of code. Do you think it can be done in existing code by tweaking it?
Yes you are correct. The option with set_index is a good alternative. Check the answer of @AndreaIerardi. I updated my answer with a simpler method
You can also replace your whole code by: df.merge(df1.melt(), left_on='Ref', right_on='value', how='left').pivot_table('value', 'Ref', 'variable').reindex(df['Ref']).rename_axis(columns=None).reset_index()
1

What about setting 'Ref' col as index while getting dataframe list. (And resetting index such that you get back Ref as a column)

df = pd.DataFrame({'Ref':[1,2,3,4]}) 
df1 = pd.DataFrame({'A':[2,3],'Z':[1,2]})


df = [df.merge(df1[col_name],left_on='Ref',right_on=col_name,how='left').set_index('Ref') for col_name in df1.columns]
df = pd.concat(df,axis=1)

df = df.reset_index()

        Ref  A   Z
        1   NaN 1.0
        2   2.0 2.0 
        3   3.0 NaN 
        4   NaN NaN

Comments

1

This is a reduction process. Instead of the list comprehension use for - loop, or even reduce:

from functools import reduce

reduce(lambda x, y : x.merge(df1[y],left_on='Ref',right_on=y,how='left'), df1.columns, df)

   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN

The above is similar to:

for y in df1.columns:
    df = df.merge(df1[y],left_on='Ref',right_on=y,how='left')
df
 
   Ref    A    Z
0    1  NaN  1.0
1    2  2.0  2.0
2    3  3.0  NaN
3    4  NaN  NaN
   

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.