2

I work with two different pandas dataframes:

dataframe1:

      Year          State    EMW
0     1968        Alabama   8.55
1     1968         Alaska  15.61
2     1968        Arizona   8.55
3     1968       Arkansas   8.55
4     1968     California  12.26
...    ...            ...    ...
2857  2020       Virginia   7.25
2858  2020     Washington  13.50
2859  2020  West Virginia   8.75
2860  2020      Wisconsin   7.25
2861  2020        Wyoming   7.25

and dataframe2:

                         NAME            STATUS    ISO ANSI1  ANSI2 USPS
0                     Alabama             State  US-AL    AL      1   AL
1                      Alaska             State  US-AK    AK      2   AK
2                     Arizona             State  US-AZ    AZ      4   AZ
3                    Arkansas             State  US-AR    AR      5   AR
4                  California             State  US-CA    CA      6   CA
5                    Colorado             State  US-CO    CO      8   CO
6                 Connecticut             State  US-CT    CT      9   CT
7                    Delaware             State  US-DE    DE     10   DE
8        District of Columbia  Federal district  US-DC    DC     11    q
9                     Florida             State  US-FL    FL     12   FL
...                       ...               ...    ...    ...    ...  ...

What I'm Trying to do:

Replace all values in the 'State' column in dataframe1 with their equivalent ANSI1 code from dataframe2.

So basically, I'm looking to have a result such as this:

Alabama -> AL
Alaska -> AK
Arizona -> AZ

and so on.

For some reason, nothing I've tried so far works.

What I've tried:

  1. A single line for loop

[dataframe1.replace({'State' : {dataframe2.loc[i]['NAME'] : dataframe2.loc[i][ANSI1']}},inplace = True) for i in range(0, len(dataframe2))]

  1. An equivalent nested loop structure:
    for state_name in pd.unique(dataframe1['State']):
        for ansi_name in dataframe2['ANSI1']:
            if ansi_name == state_name :
                dataframe1.replace({'State' : { state_name : ansi_name }}, inplace = True)

Note I suspected I might be trying to compare different types so I tried:

dataframe1.replace({'State' : {'Alabama' : 'AL'}}, inplace=True) 

and sure enough, it worked.

EDIT:

Creating a dictionary with

dState = dict(df1[['NAME', 'ANSI1']].values)

produces a dictionary where the values are as follows:

{'\xa0Alabama': 'AL', '\xa0Alaska': 'AK', '\xa0Arizona': 'AZ', '\xa0Arkansas': 'AR', '\xa0California': 'CA', '\x
a0Colorado': 'CO', '\xa0Connecticut': 'CT', '\xa0Delaware': 'DE', ' District of Columbia': 'DC', ' Florida': 'FL
', '\xa0Georgia': 'GA', '\xa0Hawaii': 'HI', '\xa0Idaho': 'ID', '\xa0Illinois': 'IL', '\xa0Indiana': 'IN', '\xa0I
owa': 'IA', '\xa0Kansas': 'KS', '\xa0Kentucky': 'KY', '\xa0Louisiana': 'LA', '\xa0Maine': 'ME', '\xa0Maryland':
'MD', '\xa0Massachusetts': 'MA', '\xa0Michigan': 'MI', '\xa0Minnesota': 'MN', '\xa0Mississippi': 'MS', '\xa0Miss
ouri': 'MO', '\xa0Montana': 'MT', '\xa0Nebraska': 'NE', '\xa0Nevada': 'NV', '\xa0New Hampshire': 'NH', '\xa0New
Jersey': 'NJ', '\xa0New Mexico': 'NM', '\xa0New York': 'NY', '\xa0North Carolina': 'NC', '\xa0North Dakota': 'ND
', '\xa0Ohio': 'OH', '\xa0Oklahoma': 'OK', '\xa0Oregon': 'OR', '\xa0Pennsylvania': 'PA', '\xa0Rhode Island': 'RI
', '\xa0South Carolina': 'SC', '\xa0South Dakota': 'SD', '\xa0Tennessee': 'TN', '\xa0Texas': 'TX', '\xa0Utah': '
UT', '\xa0Vermont': 'VT', '\xa0Virginia': 'VA', '\xa0Washington': 'WA', '\xa0West Virginia': 'WV', '\xa0Wisconsi
n': 'WI', '\xa0Wyoming': 'WY', ' Puerto Rico': 'PR', ' U.S. Virgin Islands': 'VI', ' Guam': 'GU', ' Northern Mar
iana Islands': 'MP', ' American Samoa': 'AS'}

So it makes sense now that I couldn't get anywhere by comparing them to the values from df1['State']

I am now starting to suspect that I may have missed something in the encoding of the csv I import df2 from.

4
  • create a dictionary of the two columns NAME and ANSI1 from DataFrame2. Then do a map on DataFrame1 using the dictionary. That will solve. Let me write up the code and share shortly Commented Feb 2, 2021 at 20:22
  • You can fix the import so it can remove the leading \xa0 and leading spaces or you can do that as a post process Commented Feb 2, 2021 at 20:59
  • The thing is, the leading \xa0 only shows up when I print out the dictionary and not when I print out the dataframe itself, so I'm not sure as to at which point I should sanitize it. Commented Feb 2, 2021 at 21:03
  • run this after you have created the df2. df2['NAME'] = df2.NAME.str.replace(r'\xa0|^ ',''). It will remove all leading spaces and \xa0 Commented Feb 2, 2021 at 21:46

4 Answers 4

1

This problem can be done in a few simple steps using dataframes manipulation

  1. extract data that you will use from df2
  2. merge dataframes
  3. drop unnecessary values

this in code will look something like this.

Step 1:

df2_use = df2[['Name','ANSI1']]

Step 2:

df1.merge(df2_use , how='left', right_on='Name', left_on='State')

step 3:

df1 = df1.drop(['Name','State'], axis=1).rename(columns={'ANSI1': 'State')

And you will have the dataframe you are looking for

Sign up to request clarification or add additional context in comments.

Comments

1

How about using dict with zip, and then map:

li = dict(zip(df2['NAME'],df2['ANSI1']))
df['new_State'] = df['State'].map(li)

print(df)
   Year          State    EMW new_State
0  1968        Alabama  8.550        AL
1  1968         Alaska 15.610        AK
2  1968        Arizona  8.550        AZ
3  1968       Arkansas  8.550        AR
4  1968     California 12.260        CA
5  2020       Virginia  7.250       NaN
6  2020     Washington 13.500       NaN
7  2020  West Virginia  8.750       NaN
8  2020      Wisconsin  7.250       NaN
9  2020        Wyoming  7.250       NaN

2 Comments

It looks like the same answer I posted sometime back
yes, it works. dict and map is the way to go.
1

Please use the below code to remove the leading space or \xa0 from your dataframe2.

df2['NAME'] = df2.NAME.str.replace(r'\xa0|^ ','')

Then you can do the below:

You can create a dictionary of the NAME and ANSI1' first. Then use map() to convert the value of State to ANSI1 value.

Step 1: Create a dictionary of NAME and ANSI1 using the below command.

dState = dict(df2[['NAME','ANSI1']].values)

Step 2: Map the State value in df1 using the dictionary. Use the below command.

df1['ANSI1'] = df1.State.map(dState)

This will give you the results you are looking for.

The code is:

dState = dict(df2[['NAME','ANSI1']].values)
df1['ANSI1'] = df1.State.map(dState)

The result will be:

DataFrame 1:

                   NAME            STATUS    ISO ANSI1  ANSI2 USPS
0               Alabama             State  US-AL    AL      1   AL
1                Alaska             State  US-AK    AK      2   AK
2               Arizona             State  US-AZ    AZ      4   AZ
3              Arkansas             State  US-AR    AR      5   AR
4            California             State  US-CA    CA      6   CA
5              Colorado             State  US-CO    CO      8   CO
6           Connecticut             State  US-CT    CT      9   CT
7              Delaware             State  US-DE    DE     10   DE
8  District of Columbia  Federal district  US-DC    DC     11    q
9               Florida             State  US-FL    FL     12   FL

DataFrame 2:

   Year          State    EMW
0  1968        Alabama   8.55
1  1968         Alaska  15.61
2  1968        Arizona   8.55
3  1968       Arkansas   8.55
4  1968     California  12.26
5  2020       Virginia   7.25
6  2020     Washington  13.50
7  2020  West Virginia   8.75
8  2020      Wisconsin   7.25
9  2020        Wyoming   7.25

The intermediate dictionary that gets created for dState is:

{'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR', 'California': 'CA', 'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE', 'District of Columbia': 'DC', 'Florida': 'FL'}

Note: This dictionary does not have all the states.

The results by using map will give you:

   Year          State    EMW ANSI1
0  1968        Alabama   8.55    AL
1  1968         Alaska  15.61    AK
2  1968        Arizona   8.55    AZ
3  1968       Arkansas   8.55    AR
4  1968     California  12.26    CA
5  2020       Virginia   7.25   NaN
6  2020     Washington  13.50   NaN
7  2020  West Virginia   8.75   NaN
8  2020      Wisconsin   7.25   NaN
9  2020        Wyoming   7.25   NaN

Once you have all the states in the dictionary, your NaN values will go away.

I added a few more states. Here's the updated results:

   Year          State    EMW ANSI1
0  1968        Alabama   8.55    AL
1  1968         Alaska  15.61    AK
2  1968        Arizona   8.55    AZ
3  1968       Arkansas   8.55    AR
4  1968     California  12.26    CA
5  2020       Virginia   7.25    VA
6  2020     Washington  13.50    WA
7  2020  West Virginia   8.75    WV
8  2020      Wisconsin   7.25    WI
9  2020        Wyoming   7.25    WY

3 Comments

This is definitely helpful, however it doesn't bring an end to my troubles. See edit.
what kind of trouble do you have. Does your dataframe have leading spaces or trailing spaces? Help me understand so I can help you
Thank you for your interest. I just updated the question.
0

I solved my problem with:

dState = dict(df2[['NAME','ANSI1']].values)

for name in dState:
    for state_name in df1['State']:
        if state_name in name:
            df1.replace( {'State':{state_name : dState[name] }}, inplace = True)

Thanks guys.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.