1

Considering two dataframes as follows:

import pandas as pd

df_rp = pd.DataFrame({'id':[1,2,3,4,5,6,7,8], 'res': ['a','b','c','d','e','f','g','h']})

df_cdr = pd.DataFrame({'id':[1,2,5,6,7,1,2,3,8,9,3,4,8], 
                       'LATITUDE':[-22.98, -22.97, -22.92, -22.87, -22.89, -22.84, -22.98, 
                                   -22.14, -22.28, -22.42, -22.56, -22.70, -22.13], 
                       'LONGITUDE':[-43.19, -43.39, -43.24, -43.28, -43.67, -43.11, -43.22,
                                   -43.33, -43.44, -43.55, -43.66, -43.77, -43.88]})

What I have to do:

  • Compare each df_rp['id'] element with each df_cdr['id'] element;
  • If they are the same, I need to add in a data structure (list, series, etc.) the latitudes and longitudes that are on the same line as the id without repeating the id.

Below is an example of how I need the data to be grouped:

1:[-22.98,-43.19],[-22.84,-43.11] 
2:[-22.97,-43.39],[-22.98,-43.22]
3:[-22.14,-43.33],[-22.56,-43.66]
4:[-22.70,-43.77]
5:[-22.92,-43.24]
6:[-22.87,-43.28]
7:[-22.89,-43.67]
8:[-22.28,-43.44],[-22.13,-43.88]

I'm having a hard time choosing which data structure is best for the situation (as I did in the example looks like a dictionary, but there would be several dictionaries) and how to add latitude and logitude to pairs without repeating the id. I appreciate any help.

1

3 Answers 3

2

We need to agg the second df , then reindex assign it back

df_rp['L$L']=df_cdr.drop('id',1).apply(tuple,1).groupby(df_cdr.id).agg(list).reindex(df_rp.id).to_numpy()
df_rp
Out[59]: 
   id res                                   L$L
0   1   a  [(-22.98, -43.19), (-22.84, -43.11)]
1   2   b  [(-22.97, -43.39), (-22.98, -43.22)]
2   3   c  [(-22.14, -43.33), (-22.56, -43.66)]
3   4   d                     [(-22.7, -43.77)]
4   5   e                    [(-22.92, -43.24)]
5   6   f                    [(-22.87, -43.28)]
6   7   g                    [(-22.89, -43.67)]
7   8   h  [(-22.28, -43.44), (-22.13, -43.88)]
Sign up to request clarification or add additional context in comments.

Comments

2
df_cdr['lat_long'] = df_cdr.apply(lambda x: list([x['LATITUDE'],x['LONGITUDE']]),axis=1)

df_cdr = df_cdr.drop(columns=['LATITUDE' , 'LONGITUDE'],axis=1)

df_cdr = df_cdr.groupby('id').agg(lambda x: x.tolist())

Output

                                lat_long
id                                      
1   [[-22.98, -43.19], [-22.84, -43.11]]
2   [[-22.97, -43.39], [-22.98, -43.22]]
3   [[-22.14, -43.33], [-22.56, -43.66]]
4                      [[-22.7, -43.77]]
5                     [[-22.92, -43.24]]
6                     [[-22.87, -43.28]]
7                     [[-22.89, -43.67]]
8   [[-22.28, -43.44], [-22.13, -43.88]]
9                     [[-22.42, -43.55]]

Comments

1

Assume df_rp.id is unique and sorted as in your sample. I come up with solution using set_index and loc to filter out id in df_cdr, but not in df_rp. Next, call groupby with lambda returns arrays

s = (df_cdr.set_index('id').loc[df_rp.id].groupby(level=0).
                            apply(lambda x: x.to_numpy()))

Out[709]:
id
1    [[-22.98, -43.19], [-22.84, -43.11]]
2    [[-22.97, -43.39], [-22.98, -43.22]]
3    [[-22.14, -43.33], [-22.56, -43.66]]
4                       [[-22.7, -43.77]]
5                      [[-22.92, -43.24]]
6                      [[-22.87, -43.28]]
7                      [[-22.89, -43.67]]
8    [[-22.28, -43.44], [-22.13, -43.88]]
dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.