2

I have two dataframes. The first dataframe is df_states and the second dataframe is state_lookup.

df_states

   state         code     score
0  Texas         0        0.753549
1  Pennsylvania  0        0.998119
2  California    1        0.125751
3  Texas         2        0.125751
state_lookup

   state         code_0    code_1   code_2
0  Texas         2014      2015     2019
1  Pennsylvania  2015      2016     207
2  California    2014      2015     2019

I want to create a new column in df_states called 'year' which is based off the 'code' column which is based off the state_lookup table. So for example, if Texas has a code = 0 then based off the state_lookup df the year should be 2014. If Texas has a code = 2, then the year should be 2019.

This is what the end result should look like:

df_states

   state         code     score      year
0  Texas         0        0.753      2014
1  Pennsylvania  0        0.998      2015
2  California    1        0.125      2015
3  Texas         2        0.124      2019

I've tried using a for loop to iterate through each row, but am unable to get it to work. How would you achieve this?

2 Answers 2

2

You can first use wide_to_long on your state_lookup df so you can perform a merge:

s = pd.wide_to_long(state_lookup,stubnames="code",sep="_",i="state",j="year",suffix="\d").reset_index()
s.columns = ["state","code","year"] #rename the columns properly

print (df_states.merge(s, on=["state","code"],how="left"))

          state  code     score  year
0         Texas     0  0.753549  2014
1  Pennsylvania     0  0.998119  2015
2    California     1  0.125751  2015
3         Texas     2  0.125751  2019
Sign up to request clarification or add additional context in comments.

Comments

1

Load dataframes

df_states = pd.DataFrame({'state':['Texas','Pennsylvania','California','Texas'],'code':[0,0,1,2], 'score':[0.753549,0.998119,0.125751,0.12575]})
state_lookup = pd.DataFrame({'state':['Texas','Pennsylvania','California'],'code_0': [2014,2015,2014],'code_1': [2015,2016,2017] , 'code_2': [2019,2017,2019]})

First use melt to convert your code_ columns into rows

melted_lookup = pd.melt(state_lookup,
                        id_vars=['state'],
                        value_vars=[col for col in state_lookup.columns if col.startswith('code_')], 
                        var_name='new_code',
                        value_name='year')

Then merge the two dataframes:

df_states['new_code'] = "code_"+ df_states.code.astype('str') 

df_states = pd.merge(df_states, melted_lookup, how = 'left', on =['new_code','state'])

#   state        code   score      new_code year
#0  Texas           0   0.753549    code_0  2014
#1  Pennsylvania    0   0.998119    code_0  2015
#2  California      1   0.125751    code_1  2017
#3  Texas           2   0.125750    code_2  2019

2 Comments

For some reason with this solution I'm getting a NaN for all the values in the year column
I have added the code to create df_states and state_lookup (in case it's a column type issue), and the resulting dataframe, I don't get NaN values

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.