1

I have a dataframe df1 of coordinate values like this:

    lat         lon         E               N
0   48.010258   -6.156909   90089.518711    -201738.245555
1   48.021648   -6.105887   93961.324059    -200676.766517
2   48.033028   -6.054801   97836.321204    -199614.270439
... ...         ...         ...             ...

and another dataframe df2 that associates a climatic value to each (lat, lon) pair:

    lat         lon        val
0   48.010258   -6.156909  17.11
1   48.021648   -6.105887  22.23
2   48.033028   -6.054801  39.86
... ...         ...        ...

I want to create a new column, df1['corr_pos'], where each row is given the index of df2 corresponding to the (lat, lon) pair in df1. It is like using VLOOKUP in Excel, but using two values to identify the correct index instead of using only one. The two values are the coordinate pair.

The output would be:

    lat         lon         E               N               corr_pos
0   48.010258   -6.156909   90089.518711    -201738.245555  0
1   48.021648   -6.105887   93961.324059    -200676.766517  3
2   48.033028   -6.054801   97836.321204    -199614.270439  8
... ...         ...         ...             ...             ...

The dataframes df1 and df2 do not have the same order. How could I implement this in pandas?

1 Answer 1

1

I think you need merge with reset_index to create a new column from index:

print (df2)
          lat       lon    val
7   48.010258 -6.156909  17.11
10  48.021648 -6.105887  22.23
12  48.033028 -6.054801  39.86
df = pd.merge(df1, 
              df2.reset_index().drop('val', axis=1).rename(columns={'index':'corr_pos'}), 
              on=['lat','lon'], 
              how='left')
print (df)
         lat       lon             E              N  corr_pos
0  48.010258 -6.156909  90089.518711 -201738.245555         7
1  48.021648 -6.105887  93961.324059 -200676.766517        10
2  48.033028 -6.054801  97836.321204 -199614.270439        12

If df2 has many columns, it is better to use subset as merge will delete them:

df = pd.merge(df1, 
              df2.reset_index()[['lat','lon', 'index']].rename(columns={'index':'corr_pos'}),
              on=['lat','lon'], 
              how='left')
print (df)
         lat       lon             E              N  corr_pos
0  48.010258 -6.156909  90089.518711 -201738.245555         7
1  48.021648 -6.105887  93961.324059 -200676.766517        10
2  48.033028 -6.054801  97836.321204 -199614.270439        12
Sign up to request clarification or add additional context in comments.

6 Comments

What if I want to add the corr_pos column to df1, instead of creating a new dataframe? Of course I don't want to lose the correspondences.
Hmmm, there is problem you need join by 2 columns, so cannot use map solution. I have no idea it is possible without merge, maybe join - all methods return new dataframe..
But I can create a new df and then use map to add the new column to df1.
Yes, but map need one column which is mapped to another values. If check this solution there is only one column in on parameter, so map can be used. But if have more as 1 joined columns on=['val1','val2',...] then map solution is impossible.
On second thoughts this doesn't work. It creates a corr_pos_x and corr_pos_y that I do not want.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.