Search and replace values between two different pandas dataframes

Question

I work with two different pandas dataframes:

dataframe1:

      Year          State    EMW
0     1968        Alabama   8.55
1     1968         Alaska  15.61
2     1968        Arizona   8.55
3     1968       Arkansas   8.55
4     1968     California  12.26
...    ...            ...    ...
2857  2020       Virginia   7.25
2858  2020     Washington  13.50
2859  2020  West Virginia   8.75
2860  2020      Wisconsin   7.25
2861  2020        Wyoming   7.25

and dataframe2:

                         NAME            STATUS    ISO ANSI1  ANSI2 USPS
0                     Alabama             State  US-AL    AL      1   AL
1                      Alaska             State  US-AK    AK      2   AK
2                     Arizona             State  US-AZ    AZ      4   AZ
3                    Arkansas             State  US-AR    AR      5   AR
4                  California             State  US-CA    CA      6   CA
5                    Colorado             State  US-CO    CO      8   CO
6                 Connecticut             State  US-CT    CT      9   CT
7                    Delaware             State  US-DE    DE     10   DE
8        District of Columbia  Federal district  US-DC    DC     11    q
9                     Florida             State  US-FL    FL     12   FL
...                       ...               ...    ...    ...    ...  ...

What I'm Trying to do:

Replace all values in the 'State' column in dataframe1 with their equivalent ANSI1 code from dataframe2.

So basically, I'm looking to have a result such as this:

Alabama -> AL
Alaska -> AK
Arizona -> AZ

and so on.

For some reason, nothing I've tried so far works.

What I've tried:

A single line for loop

[dataframe1.replace({'State' : {dataframe2.loc[i]['NAME'] : dataframe2.loc[i][ANSI1']}},inplace = True) for i in range(0, len(dataframe2))]

An equivalent nested loop structure:

    for state_name in pd.unique(dataframe1['State']):
        for ansi_name in dataframe2['ANSI1']:
            if ansi_name == state_name :
                dataframe1.replace({'State' : { state_name : ansi_name }}, inplace = True)

Note I suspected I might be trying to compare different types so I tried:

dataframe1.replace({'State' : {'Alabama' : 'AL'}}, inplace=True)

and sure enough, it worked.

EDIT:

Creating a dictionary with

dState = dict(df1[['NAME', 'ANSI1']].values)

produces a dictionary where the values are as follows:

{'\xa0Alabama': 'AL', '\xa0Alaska': 'AK', '\xa0Arizona': 'AZ', '\xa0Arkansas': 'AR', '\xa0California': 'CA', '\x
a0Colorado': 'CO', '\xa0Connecticut': 'CT', '\xa0Delaware': 'DE', ' District of Columbia': 'DC', ' Florida': 'FL
', '\xa0Georgia': 'GA', '\xa0Hawaii': 'HI', '\xa0Idaho': 'ID', '\xa0Illinois': 'IL', '\xa0Indiana': 'IN', '\xa0I
owa': 'IA', '\xa0Kansas': 'KS', '\xa0Kentucky': 'KY', '\xa0Louisiana': 'LA', '\xa0Maine': 'ME', '\xa0Maryland':
'MD', '\xa0Massachusetts': 'MA', '\xa0Michigan': 'MI', '\xa0Minnesota': 'MN', '\xa0Mississippi': 'MS', '\xa0Miss
ouri': 'MO', '\xa0Montana': 'MT', '\xa0Nebraska': 'NE', '\xa0Nevada': 'NV', '\xa0New Hampshire': 'NH', '\xa0New
Jersey': 'NJ', '\xa0New Mexico': 'NM', '\xa0New York': 'NY', '\xa0North Carolina': 'NC', '\xa0North Dakota': 'ND
', '\xa0Ohio': 'OH', '\xa0Oklahoma': 'OK', '\xa0Oregon': 'OR', '\xa0Pennsylvania': 'PA', '\xa0Rhode Island': 'RI
', '\xa0South Carolina': 'SC', '\xa0South Dakota': 'SD', '\xa0Tennessee': 'TN', '\xa0Texas': 'TX', '\xa0Utah': '
UT', '\xa0Vermont': 'VT', '\xa0Virginia': 'VA', '\xa0Washington': 'WA', '\xa0West Virginia': 'WV', '\xa0Wisconsi
n': 'WI', '\xa0Wyoming': 'WY', ' Puerto Rico': 'PR', ' U.S. Virgin Islands': 'VI', ' Guam': 'GU', ' Northern Mar
iana Islands': 'MP', ' American Samoa': 'AS'}

So it makes sense now that I couldn't get anywhere by comparing them to the values from df1['State']

I am now starting to suspect that I may have missed something in the encoding of the csv I import df2 from.

create a dictionary of the two columns NAME and ANSI1 from DataFrame2. Then do a map on DataFrame1 using the dictionary. That will solve. Let me write up the code and share shortly — Joe Ferndz
– Joe Ferndz, Commented Feb 2, 2021 at 20:22
You can fix the import so it can remove the leading \xa0 and leading spaces or you can do that as a post process — Joe Ferndz
– Joe Ferndz, Commented Feb 2, 2021 at 20:59
The thing is, the leading \xa0 only shows up when I print out the dictionary and not when I print out the dataframe itself, so I'm not sure as to at which point I should sanitize it. — Manos Manos
– Manos Manos, Commented Feb 2, 2021 at 21:03
run this after you have created the df2. df2['NAME'] = df2.NAME.str.replace(r'\xa0|^ ',''). It will remove all leading spaces and \xa0 — Joe Ferndz
– Joe Ferndz, Commented Feb 2, 2021 at 21:46

DanCor · Accepted Answer · 2021-02-02 20:48:57Z

1

This problem can be done in a few simple steps using dataframes manipulation

extract data that you will use from df2
merge dataframes
drop unnecessary values

this in code will look something like this.

Step 1:

df2_use = df2[['Name','ANSI1']]

Step 2:

df1.merge(df2_use , how='left', right_on='Name', left_on='State')

step 3:

df1 = df1.drop(['Name','State'], axis=1).rename(columns={'ANSI1': 'State')

And you will have the dataframe you are looking for

answered Feb 2, 2021 at 20:48

DanCor

3383 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sophocles · Accepted Answer · 2021-02-02 20:55:51Z

1

How about using dict with zip, and then map:

li = dict(zip(df2['NAME'],df2['ANSI1']))
df['new_State'] = df['State'].map(li)

print(df)
   Year          State    EMW new_State
0  1968        Alabama  8.550        AL
1  1968         Alaska 15.610        AK
2  1968        Arizona  8.550        AZ
3  1968       Arkansas  8.550        AR
4  1968     California 12.260        CA
5  2020       Virginia  7.250       NaN
6  2020     Washington 13.500       NaN
7  2020  West Virginia  8.750       NaN
8  2020      Wisconsin  7.250       NaN
9  2020        Wyoming  7.250       NaN

answered Feb 2, 2021 at 20:55

sophocles

13.9k3 gold badges18 silver badges37 bronze badges

2 Comments

Joe Ferndz Over a year ago

It looks like the same answer I posted sometime back

Joe Ferndz Over a year ago

yes, it works. dict and map is the way to go.

Joe Ferndz · Accepted Answer · 2021-02-02 21:46:23Z

Please use the below code to remove the leading space or \xa0 from your dataframe2.

df2['NAME'] = df2.NAME.str.replace(r'\xa0|^ ','')

Then you can do the below:

You can create a dictionary of the NAME and ANSI1' first. Then use map() to convert the value of State to ANSI1 value.

Step 1: Create a dictionary of NAME and ANSI1 using the below command.

dState = dict(df2[['NAME','ANSI1']].values)

Step 2: Map the State value in df1 using the dictionary. Use the below command.

df1['ANSI1'] = df1.State.map(dState)

This will give you the results you are looking for.

The code is:

dState = dict(df2[['NAME','ANSI1']].values)
df1['ANSI1'] = df1.State.map(dState)

The result will be:

DataFrame 1:

                   NAME            STATUS    ISO ANSI1  ANSI2 USPS
0               Alabama             State  US-AL    AL      1   AL
1                Alaska             State  US-AK    AK      2   AK
2               Arizona             State  US-AZ    AZ      4   AZ
3              Arkansas             State  US-AR    AR      5   AR
4            California             State  US-CA    CA      6   CA
5              Colorado             State  US-CO    CO      8   CO
6           Connecticut             State  US-CT    CT      9   CT
7              Delaware             State  US-DE    DE     10   DE
8  District of Columbia  Federal district  US-DC    DC     11    q
9               Florida             State  US-FL    FL     12   FL

DataFrame 2:

   Year          State    EMW
0  1968        Alabama   8.55
1  1968         Alaska  15.61
2  1968        Arizona   8.55
3  1968       Arkansas   8.55
4  1968     California  12.26
5  2020       Virginia   7.25
6  2020     Washington  13.50
7  2020  West Virginia   8.75
8  2020      Wisconsin   7.25
9  2020        Wyoming   7.25

The intermediate dictionary that gets created for dState is:

{'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR', 'California': 'CA', 'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE', 'District of Columbia': 'DC', 'Florida': 'FL'}

Note: This dictionary does not have all the states.

The results by using map will give you:

   Year          State    EMW ANSI1
0  1968        Alabama   8.55    AL
1  1968         Alaska  15.61    AK
2  1968        Arizona   8.55    AZ
3  1968       Arkansas   8.55    AR
4  1968     California  12.26    CA
5  2020       Virginia   7.25   NaN
6  2020     Washington  13.50   NaN
7  2020  West Virginia   8.75   NaN
8  2020      Wisconsin   7.25   NaN
9  2020        Wyoming   7.25   NaN

Once you have all the states in the dictionary, your NaN values will go away.

I added a few more states. Here's the updated results:

   Year          State    EMW ANSI1
0  1968        Alabama   8.55    AL
1  1968         Alaska  15.61    AK
2  1968        Arizona   8.55    AZ
3  1968       Arkansas   8.55    AR
4  1968     California  12.26    CA
5  2020       Virginia   7.25    VA
6  2020     Washington  13.50    WA
7  2020  West Virginia   8.75    WV
8  2020      Wisconsin   7.25    WI
9  2020        Wyoming   7.25    WY

This is definitely helpful, however it doesn't bring an end to my troubles. See edit.
what kind of trouble do you have. Does your dataframe have leading spaces or trailing spaces? Help me understand so I can help you

Manos Manos · Accepted Answer · 2021-02-02 21:15:21Z

0

I solved my problem with:

dState = dict(df2[['NAME','ANSI1']].values)

for name in dState:
    for state_name in df1['State']:
        if state_name in name:
            df1.replace( {'State':{state_name : dState[name] }}, inplace = True)

Thanks guys.

answered Feb 2, 2021 at 21:15

Manos Manos

1216 bronze badges

Collectives™ on Stack Overflow

Search and replace values between two different pandas dataframes

4 Answers 4

Comments

2 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related