1

I have 2 dataFrames.

df1:

index ID City         Region 2City
1     23 Moscow       Msk    
2     34 Obninsk      Msk    Msk
3     56                     Spb
4     17 Tula         Spb

df2:

index City   Office
1     Msk    Msk
2     Spb    Spb
3     Tula   Msk
4     Moscow Msk

I would like to get the following df:

index ID City         Region 2City Office
1     23 Moscow       Msk          Msk
2     34 Obninsk      Msk    Msk   Msk
3     56                     Spb   Spb
4     17 Tula         Spb          Msk

So it checks which 'office' from df2 matches 'City', 'Region', '2City' from df1.

Also if I found 'office' for 'City', I stop to search. So 'City'column have the priority, then comes 'Region', then comes '2City'.

I know how to do this with 3 for, but I hope there is a better option.

0

4 Answers 4

1

One generally does a VLOOKUP in Pandas using join (which defaults to a left join) or merge (where one must specify how='left').

In your case, you are trying to lookup a value based on three columns. One way to do this is by using or to take the first non-null value.

>>> (df1
     .assign(temp = [region or two_city or city 
                     for region, two_city, city in zip(df1.Region, df1['2City'], df1.City)])
     .merge(df2, how='left', right_on='City', left_on='temp', suffixes=['', '_'])
     .drop(['temp', 'City_'], axis=1))

   ID     City Region 2City Office
0  23   Moscow    Msk  None    Msk
1  34  Obninsk    Msk   Msk    Msk
2  56     None   None   Spb    Spb
3  17     Tula    Spb  None    Msk
Sign up to request clarification or add additional context in comments.

Comments

0
import pandas as pd
df1 = pd.DataFrame([[23, 'Moscow', 'Msk', ''],
                    [34, 'Obninsk', 'Msk', 'Msk'],
                    [56, '', '', 'Spb'],
                    [17, 'Tula', 'Spb', '']],
                   columns=['ID', 'City', 'Region', '2City'])
df2 = pd.DataFrame([['Msk', 'Msk'],
                    ['Spb', 'Spb'],
                    ['Tula', 'Msk'],
                    ['Moscow', 'Msk']],
                   columns=['City', 'Office'])

df = pd.concat([df1.loc[df1[x].isin(df2['City']), x] for x in ['City', 'Region', '2City']])
df1['Join'] = df.groupby(df.index).first()
output = df1.merge(df2, left_on='Join', right_on='City', how='right')

At this point, the pandas.DataFrame called "output" will contain a column called "Office" indexed appropriately to match with df1. You could do something like:

df1['Office'] = output['Office']

And this will give you the DataFrame you requested plus the additional column "Join", which you can drop via:

df1.drop('Join', axis=1, inplace=True)

The OP here wants basically to do three separate merges--one on each of the columns "City", "Region", and "2City" from df1. It is therefore not easy to do this using basic pandas.DataFrame operations without some gymnastics. I just hid the for loop in the list comprehension, but it's still there.

1 Comment

That one worked! But only on my test DF. Now, trying to do with my normal DF:)
0

You can merge the two DataFrames with pd.merge, but as I understand it, you actually want to merge on different columns in df1. A possible way to do this is by adding an extra column that takes the value of 'City' if available (otherwise'region' or '2City').

import pandas as pd

df1['Office'] = df1.City.fillna(df1.Region).fillna(df1['2City'])
df = pd.merge(df1, df2.reindex(columns='Office'), on='Office')

You don't specify if you have missing values in either DataFrame. If so, you can control their handling with the how argument to pd.merge.

Comments

0

Without a for loop:

cols = ['City', 'Region', '2City']

df1[cols].applymap(lambda x: df2.set_index('City')['Office'].get(x)) \
         .apply(lambda row: row[row.first_valid_index()], axis=1)

I set City as the index for df2 in order to lookup values with the get method. After looking up the values element-wise with applymap, I find the first valid answer with first_valid_index.

1 Comment

Tried it. Firstly it didn't want to deal with Nan, now I have the following mistake: KeyError: (<class 'object'>, 'occurred at index 0') and don't know what to do next

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.