1

I have two dataframes df1 and df2 (df1 significantly larger in amount of rows). First dataframe consists of five columns: Column 1 - contains postcodes, Column 2 to Column 5 - contains certain code. Second dataframe consists of 2 columns: Column 1 - Raw code, Column 2 - New code. The idea is to take the code from Column 2 in the first dataframe, search for the corresponding New Code in the second dataframe based on the match in the Code column, and then replace the value in Column 2 of the first dataframe with the New Code from the second dataframe:

Postcode    Column 2   Column 3   Column 4   Column 5
BS105JJ         1         3          0         1
BS105JL         0         0          0         1
BS105JN         1         2          0         1
BS105JP         0         0          0         1
BS105JR         1         1          0         1
BS105JS         0         0          0         1
BS105JT         1         5          0         1


Code       Code New
  0            1
  1            3
  2            5
  3            7
  4            9
  5            3

Expected result:

Postcode    Column 2   Column 3   Column 4   Column 5
BS105JJ         3         3          0          1
BS105JL         1         0          0          1
BS105JN         3         2          0          1
BS105JP         1         0          0          1
BS105JR         3         1          0          1
BS105JS         1         0          0          1
BS105JT         3         5          0          1

I have tried to browse for solutions, but so far only found methods to vlookup equal dataframes. In perfect world I would like to find solution with reusable function (each column will need to perform this vlookup with different dataframe) but even if just method is shown - It is fine, I will figure out how to put it into function myself.

0

3 Answers 3

4

I think you are looking for map.

import pandas as pd

code = pd.DataFrame({'Code': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5}, 'Code New': {0: 1, 1: 3, 2: 5, 3: 7, 4: 9, 5: 3}})
df = pd.DataFrame({'Postcode': {0: 'BS105JJ', 1: 'BS105JL', 2: 'BS105JN', 3: 'BS105JP', 4: 'BS105JR', 5: 'BS105JS', 6: 'BS105JT'}, 'Column 2': {0: 1, 1: 0, 2: 1, 3: 0, 4: 1, 5: 0, 6: 1}, 'Column 3': {0: 3, 1: 0, 2: 2, 3: 0, 4: 1, 5: 0, 6: 5}, 'Column 4': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0}, 'Column 5': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1}})
df['Column 2'] = df['Column 2'].map(code.set_index('Code')['Code New'])
print(df)

# output
#   Postcode  Column 2  Column 3  Column 4  Column 5
# 0  BS105JJ         3         3         0         1
# 1  BS105JL         1         0         0         1
# 2  BS105JN         3         2         0         1
# 3  BS105JP         1         0         0         1
# 4  BS105JR         3         1         0         1
# 5  BS105JS         1         0         0         1
# 6  BS105JT         3         5         0         1
Sign up to request clarification or add additional context in comments.

1 Comment

Instead of the dict comprehension, you can just set code as the index: df['Column 2'].map(code.set_index('Code')['Code New']).
0

First create a mapping of old to new values:

mapping = df2.set_index("Code")["Code New"].to_dict()

Then apply the mapping, for example like this:

df1["Column 2"] = [mapping[value] for value in df1["Column 2"]]

Or this:

df1["Column 2"] = df1["Column 2"].map(mapping)

1 Comment

Here too: this is superfluous. You can just do: df1["Column 2"].map(df2.set_index("Code")["Code New"]).
0

You could also use .replace():

value_map = df2.set_index("Code").to_dict()["Code New"]
df1 = df1.replace({"Column 2": value_map})
print(df1)

# Output
#   Postcode  Column 2  Column 3  Column 4  Column 5
# 0  BS105JJ         3         3         0         1
# 1  BS105JL         1         0         0         1
# 2  BS105JN         3         2         0         1
# 3  BS105JP         1         0         0         1
# 4  BS105JR         3         1         0         1
# 5  BS105JS         1         0         0         1
# 6  BS105JT         3         5         0         1

That would also easily allow you to apply this to all columns, like so:

df1 = df1.replace({"Column 2": value_map, "Column 3": value_map, ...})

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.