I'm trying to map the results of a 2 level aggregation to the original categorical feature and use it as a new feature. I created the aggregation like this.
temp_df = pd.concat([X_train[['cat1', 'cont1', 'cat2']], X_test[['cat1', 'cont1', 'cat2']]])
temp_df = temp_df.groupby(['cat1', 'cat2'])['cont1'].agg(['mean']).reset_index().rename(columns={'mean': 'cat1_cont1/cat2_Mean'})
Then I made MultiIndex from the values of first and second categorical feature, and finally casted the new aggregation feature to a dict.
arrays = [list(temp_df['cat1']), list(temp_df['cat2'])]
temp_df.index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=['cat1', 'cat2'])
temp_df = temp_df['cat1_cont1/cat2_Mean'].to_dict()
The dict keys are tuples as multi indices. The first values in the tuples are cat1's values and the second values are cat2's values.
{(1000, 'C'): 23.443,
(1001, 'H'): 50.0,
(1001, 'W'): 69.5,
(1002, 'H'): 60.0,
(1003, 'W'): 42.95,
(1004, 'H'): 51.0,
(1004, 'R'): 150.0,
(1004, 'W'): 226.0,
(1005, 'H'): 50.0}
When I try to map those values to the original cat1 feature, everything becomes NaN. How can I do this properly?
X_train['cat1'].map(temp_df) # Produces a column of all NaNs