1

I have the following dataframes:

df1:

ZIP code Other columns
1011AA ...
1011AA ...
2316XH ...
5815NE ...

df2:

starting value ZIP code range last value ZIP code range Province
1000 1200 North-Holland
1201 1500 South-Holland
1501 1570 North-Holland
1571 1600 Den Haag

I want to:

  1. Get the first four digits of df1["ZIP code"]
  2. Check if these four digits are present in any range in df2["starting value ZIP code range"] and df["last value ZIP code range"]
  3. If there is a match, get df2["Province"] and add this value to a column in df1.

The difficulty is that I need to compare this to a range of values and I can only use the first 4 digits of the string. Most examples I found on stackoverflow compare based on a single value. The desired result is:

ZIP code New column
1011AA North-Holland
1011AA North-Holland
2316XH Haarlem
5815NE Utrecht

Bonus points if you can do it using map. For example, df1["New column"] = df1["ZIP code"].str[:4].map(... ? ...). However, if the map method is a bad idea please suggest a better method.

4
  • df1['Province'] = pd.merge_asof(df1.assign(key=df1['ZIP code'].str.extract('(\d+)', expand=False).astype(int)), df2, left_on='key', right_on='starting value ZIP code range').query('key <= `last value ZIP code range`')['Province'] Commented Feb 28, 2023 at 10:14
  • The query does not look correct. Notice that the the province is not unique in df2. That is, the ranges are split. Commented Feb 28, 2023 at 10:21
  • It shouldn't matter. Let me provide a full answer and then you can test it and give me a counter example if needed Commented Feb 28, 2023 at 10:22
  • Please check the answer below and report any incorrect behavior with a reproducible example Commented Feb 28, 2023 at 10:28

1 Answer 1

1

As your ranges are non-overlapping, you can use a merge_asof on the starting boundary and filter its output (for example with query) to ensure it's within the ending boundary:

df1['Province'] = (
 pd.merge_asof(df1.assign(key=df1['ZIP code'].str.extract('(\d+)', expand=False).astype(int)), df2,
               left_on='key', right_on='starting value ZIP code range')
   .query('key <= `last value ZIP code range`')['Province']
)

Output:

  ZIP code Other columns       Province
0   1011AA           ...  North-Holland
1   1011AA           ...  North-Holland
2   2316XH           ...            NaN
3   5815NE           ...            NaN

other example

Let's add one more entry to df2:

# df2
   starting value ZIP code range  last value ZIP code range       Province
0                           1000                       1200  North-Holland
1                           1201                       1500  South-Holland
2                           1501                       1570  North-Holland
3                           1571                       1600       Den Haag
4                           5000                       6000        Utrecht


# output
  ZIP code Other columns       Province
0   1011AA           ...  North-Holland
1   1011AA           ...  North-Holland
2   2316XH           ...            NaN
3   5815NE           ...        Utrecht

Ensuring the boundaries in df2 are numeric:

df2[['starting value ZIP code range', 'last value ZIP code range']] = \
df2[['starting value ZIP code range', 'last value ZIP code range']].apply(pd.to_numeric, errors='coerce')
Sign up to request clarification or add additional context in comments.

10 Comments

I get the following error: Incompatible merge dtype, dtype('O') and dtype('int32'), both sides must have numeric dtype
Ensure that the start/last columns in df2 are numeric. See the update
Almost there, the current error is that we are comparing int32 and int64: incompatible merge keys [0] dtype('int32') and dtype('int64'), must be the same type. Can I adjust the .astype(int)?
I solved it by using .astype(np.int64)
Could you maybe explain why it is sufficient to only compare the key to the last value? I can see that we merge on the starting value, but I am still a little but puzzled
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.