Comparing elements between two dataframes and adding columns in case of equality

Question

Considering two dataframes as follows:

import pandas as pd

df_rp = pd.DataFrame({'id':[1,2,3,4,5,6,7,8], 'res': ['a','b','c','d','e','f','g','h']})

df_cdr = pd.DataFrame({'id':[1,2,5,6,7,1,2,3,8,9,3,4,8], 
                       'LATITUDE':[-22.98, -22.97, -22.92, -22.87, -22.89, -22.84, -22.98, 
                                   -22.14, -22.28, -22.42, -22.56, -22.70, -22.13], 
                       'LONGITUDE':[-43.19, -43.39, -43.24, -43.28, -43.67, -43.11, -43.22,
                                   -43.33, -43.44, -43.55, -43.66, -43.77, -43.88]})

What I have to do:

Compare each df_rp['id'] element with each df_cdr['id'] element;
If they are the same, I need to add in a data structure (list, series, etc.) the latitudes and longitudes that are on the same line as the id without repeating the id.

Below is an example of how I need the data to be grouped:

1:[-22.98,-43.19],[-22.84,-43.11] 
2:[-22.97,-43.39],[-22.98,-43.22]
3:[-22.14,-43.33],[-22.56,-43.66]
4:[-22.70,-43.77]
5:[-22.92,-43.24]
6:[-22.87,-43.28]
7:[-22.89,-43.67]
8:[-22.28,-43.44],[-22.13,-43.88]

I'm having a hard time choosing which data structure is best for the situation (as I did in the example looks like a dictionary, but there would be several dictionaries) and how to add latitude and logitude to pairs without repeating the id. I appreciate any help.

Hope this might help you stackoverflow.com/questions/45436938/… — Sudharsana Rajasekaran
– Sudharsana Rajasekaran, Commented Nov 21, 2019 at 22:26

BENY · Accepted Answer · 2019-11-21 22:22:53Z

2

We need to agg the second df , then reindex assign it back

df_rp['L$L']=df_cdr.drop('id',1).apply(tuple,1).groupby(df_cdr.id).agg(list).reindex(df_rp.id).to_numpy()
df_rp
Out[59]: 
   id res                                   L$L
0   1   a  [(-22.98, -43.19), (-22.84, -43.11)]
1   2   b  [(-22.97, -43.39), (-22.98, -43.22)]
2   3   c  [(-22.14, -43.33), (-22.56, -43.66)]
3   4   d                     [(-22.7, -43.77)]
4   5   e                    [(-22.92, -43.24)]
5   6   f                    [(-22.87, -43.28)]
6   7   g                    [(-22.89, -43.67)]
7   8   h  [(-22.28, -43.44), (-22.13, -43.88)]

answered Nov 21, 2019 at 22:22

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sudharsana Rajasekaran · Accepted Answer · 2019-11-21 22:33:47Z

2

df_cdr['lat_long'] = df_cdr.apply(lambda x: list([x['LATITUDE'],x['LONGITUDE']]),axis=1)

df_cdr = df_cdr.drop(columns=['LATITUDE' , 'LONGITUDE'],axis=1)

df_cdr = df_cdr.groupby('id').agg(lambda x: x.tolist())

Output

                                lat_long
id                                      
1   [[-22.98, -43.19], [-22.84, -43.11]]
2   [[-22.97, -43.39], [-22.98, -43.22]]
3   [[-22.14, -43.33], [-22.56, -43.66]]
4                      [[-22.7, -43.77]]
5                     [[-22.92, -43.24]]
6                     [[-22.87, -43.28]]
7                     [[-22.89, -43.67]]
8   [[-22.28, -43.44], [-22.13, -43.88]]
9                     [[-22.42, -43.55]]

answered Nov 21, 2019 at 22:33

Sudharsana Rajasekaran

3384 silver badges12 bronze badges

Comments

Andy L. · Accepted Answer · 2019-11-21 23:53:35Z

1

Assume df_rp.id is unique and sorted as in your sample. I come up with solution using set_index and loc to filter out id in df_cdr, but not in df_rp. Next, call groupby with lambda returns arrays

s = (df_cdr.set_index('id').loc[df_rp.id].groupby(level=0).
                            apply(lambda x: x.to_numpy()))

Out[709]:
id
1    [[-22.98, -43.19], [-22.84, -43.11]]
2    [[-22.97, -43.39], [-22.98, -43.22]]
3    [[-22.14, -43.33], [-22.56, -43.66]]
4                       [[-22.7, -43.77]]
5                      [[-22.92, -43.24]]
6                      [[-22.87, -43.28]]
7                      [[-22.89, -43.67]]
8    [[-22.28, -43.44], [-22.13, -43.88]]
dtype: object

answered Nov 21, 2019 at 23:53

Andy L.

25.3k4 gold badges20 silver badges30 bronze badges

Collectives™ on Stack Overflow

Comparing elements between two dataframes and adding columns in case of equality

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related