I have a sample DF:
sample_df = pd.DataFrame(np.random.randint(1,20,size=(6, 2)), columns=list('AB'))
sample_df["A_cat"] = ["ind","sa","sa","sa","ind","ind"]
sample_df["B_cat"] = ["sa","ind","ind","sa","sa","sa"]
sample_df
OP:
A B A_cat B_cat
0 12 8 ind sa
1 12 11 sa ind
2 7 19 sa ind
3 5 11 sa sa
4 11 7 ind sa
5 6 18 ind sa
I have another sample DF 2 for which I am trying to replace the column values based on a condition:
sample_df2 = pd.DataFrame()
sample_df2["A_cat"] = ["sa","ind","ind","sa","sa","ind"]
sample_df2["B_cat"] = ["ind","sa","sa","ind","sa","sa"]
sample_df2
OP:
A_cat B_cat
0 sa ind
1 ind sa
2 ind sa
3 sa ind
4 sa sa
5 ind sa
Condition:
The value in sample_df2 should be replaced by taking a groupby mean of that value in sample_df.
For example, sample_df2(0,A_cat) = sa which should be replaced by sample_df.groupby(["A_cat"])["A"].mean() for group value sa
sample OP of column A_cat in sample_df2 after conversion will be:
sample_df2["A_cat"] = [8.0000,9.666667,9.666667,8.000,8.000,9.666667]
I have done the long for loop solution for this, any suggestions for pandas approach would be great!