Using one pandas dataframe to populate new column in another pandas dataframe

Question

I have two dataframes. The first dataframe is df_states and the second dataframe is state_lookup.

df_states

   state         code     score
0  Texas         0        0.753549
1  Pennsylvania  0        0.998119
2  California    1        0.125751
3  Texas         2        0.125751

state_lookup

   state         code_0    code_1   code_2
0  Texas         2014      2015     2019
1  Pennsylvania  2015      2016     207
2  California    2014      2015     2019

I want to create a new column in df_states called 'year' which is based off the 'code' column which is based off the state_lookup table. So for example, if Texas has a code = 0 then based off the state_lookup df the year should be 2014. If Texas has a code = 2, then the year should be 2019.

This is what the end result should look like:

df_states

   state         code     score      year
0  Texas         0        0.753      2014
1  Pennsylvania  0        0.998      2015
2  California    1        0.125      2015
3  Texas         2        0.124      2019

I've tried using a for loop to iterate through each row, but am unable to get it to work. How would you achieve this?

Henry Yik · Accepted Answer · 2020-03-11 03:56:16Z

2

You can first use wide_to_long on your state_lookup df so you can perform a merge:

s = pd.wide_to_long(state_lookup,stubnames="code",sep="_",i="state",j="year",suffix="\d").reset_index()
s.columns = ["state","code","year"] #rename the columns properly

print (df_states.merge(s, on=["state","code"],how="left"))

          state  code     score  year
0         Texas     0  0.753549  2014
1  Pennsylvania     0  0.998119  2015
2    California     1  0.125751  2015
3         Texas     2  0.125751  2019

answered Mar 11, 2020 at 3:56

Henry Yik

22.6k5 gold badges21 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

fmarm · Accepted Answer · 2020-03-11 04:30:59Z

1

Load dataframes

df_states = pd.DataFrame({'state':['Texas','Pennsylvania','California','Texas'],'code':[0,0,1,2], 'score':[0.753549,0.998119,0.125751,0.12575]})
state_lookup = pd.DataFrame({'state':['Texas','Pennsylvania','California'],'code_0': [2014,2015,2014],'code_1': [2015,2016,2017] , 'code_2': [2019,2017,2019]})

First use melt to convert your code_ columns into rows

melted_lookup = pd.melt(state_lookup,
                        id_vars=['state'],
                        value_vars=[col for col in state_lookup.columns if col.startswith('code_')], 
                        var_name='new_code',
                        value_name='year')

Then merge the two dataframes:

df_states['new_code'] = "code_"+ df_states.code.astype('str') 

df_states = pd.merge(df_states, melted_lookup, how = 'left', on =['new_code','state'])

#   state        code   score      new_code year
#0  Texas           0   0.753549    code_0  2014
#1  Pennsylvania    0   0.998119    code_0  2015
#2  California      1   0.125751    code_1  2017
#3  Texas           2   0.125750    code_2  2019

edited Mar 11, 2020 at 4:30

answered Mar 11, 2020 at 3:58

fmarm

4,2741 gold badge20 silver badges30 bronze badges

2 Comments

prettypython Over a year ago

For some reason with this solution I'm getting a NaN for all the values in the year column

fmarm Over a year ago

I have added the code to create df_states and state_lookup (in case it's a column type issue), and the resulting dataframe, I don't get NaN values

Collectives™ on Stack Overflow

Using one pandas dataframe to populate new column in another pandas dataframe

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related