0

I have a list of dataframes. The list might contain 3 or more dataframes. Number of rows in each dataframe can differ from one another. The dataframes look the following.

enter image description here

I have to loop through the list and create a single dataframe out of these based on the following conditions.

  • If both Line and Point of two dataframes match, create a new Res column having the value of Res of the second dataframe; value of Res of second dataframe being inserted in the same row.
  • If both Line and Point of two dataframes do not match, insert a new row, with the value of Res of second dataframe in the new Res column.

For the above mentioned sample dataframes, the resultant dataframe should look like the following:

enter image description here

I have tried merging etc but I am not able to do this in Pandas. Please help me to solve this. Thanks.

1
  • did you try merge ? Commented Jan 3, 2018 at 17:31

2 Answers 2

1

Data input:

df1=pd.DataFrame({'Line':[1,1,1],'Point':[2,3,4],'Res':[1,2,3]})

df2=pd.DataFrame({'Line':[1,2,3],'Point':[2,3,4],'Res':[1,2,3]})

df3=pd.DataFrame({'Line':[3,3,3],'Point':[4,3,4],'Res':[1,2,3]})

We using reduce merge outer

l=[df1,df2,df3]

import functools


functools.reduce(lambda left,right: pd.merge(left,right,on=['Line','Point'],how='outer'), l)
Out[228]: 
   Line  Point  Res_x  Res_y  Res
0     1      2    1.0    1.0  NaN
1     1      3    2.0    NaN  NaN
2     1      4    3.0    NaN  NaN
3     2      3    NaN    2.0  NaN
4     3      4    NaN    3.0  1.0
5     3      4    NaN    3.0  3.0
6     3      3    NaN    NaN  2.0
Sign up to request clarification or add additional context in comments.

2 Comments

@cᴏʟᴅsᴘᴇᴇᴅ just make up some :-)
Thanks a lot. Worked like a charm.
1

Using Wen's data, first perform concatenation using pd.concat, and then groupby and reshape.

df_list = [df1, df2, df3]

i = [df.rename(columns={'Res' : 'Res{}'.format(i)}) for i, df in enumerate(df_list, 1)]

df = pd.concat(df_list)\
       .groupby(['Line', 'Point'], sort=False)\
       .max()\
       .reset_index()

df

   Line  Point  Res1  Res2  Res3
0     1      2   1.0   1.0   NaN
1     1      3   2.0   NaN   NaN
2     1      4   3.0   NaN   NaN
3     2      3   NaN   2.0   NaN
4     3      4   NaN   3.0   3.0
5     3      3   NaN   NaN   2.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.