Write way to calculate weighted average with pandas Dataframe with mean or directly multiply

Question

I have the following pandas Dataframe:

data_df = pd.DataFrame({'id':['era','bb','cs','jd','ek','gtf','okg','huf','mji','loj','djjf','wloe','rfm','cok'],
                        'doc':[1050 ,580,170,8, 7, 220, 45155,305,458,201,48,78,256,358],
                        'dif':[1,1,1,3,3,2,2,3,4,5,8,7,9,10],
                        'usg':[102,258,7083,84,165,627,784,85,658,261,85,12,235,45],
                        'snp':[4.2,2.5,1.03,2.56,1.25,3.56,1.49,2.95,1.05,0.815,0.78,0.612,0.365,0.164]})
data_df

    id    doc      dif       usg         snp
0   era   1050      1        102        4.200
1   bb    580       1        258        2.500
2   cs    170       1        7083       1.030
3   jd    8         3        84         2.560
4   ek    7         3        165        1.250
5   gtf   220       2        627        3.560
6   okg   45155     2        784        1.490
7   huf   305       3        85         2.950
8   mji   458       4        658        1.050
9   loj   201       5        261        0.815
10  djjf  48        8        85         0.780
11  wloe  78        7        12         0.612
12  rfm   256       9        235        0.365
13  cok   358       10       45         0.164

I need to compute the weighted average of all the columns and finally need to make a rank column for the id. I'm planning to give 30%,20%,35%,15% weight for "doc", "dif,"usg" and "snp" columns. If "doc","usg" and "snp" is high it should get the higher weighted value. But for "dif" column if lower the grade it should be get the higher value. I'm actually struggling with that part. How to do that (lower grades get high marks). I simply multiply and get the weighted average. Is that the correct way? Is it possible to write a lambda function for this calculation? My calculation is like that

data_df["Weighted_Average"] = data_df["doc"]*0.3 + data_df["dif"]*.2+ data_df["usg"]*.35+ data_df["snp"]*.15
data_df

    id    doc      dif       usg     snp   Weighted_Average
0   era   1050     1         102    4.2     351.53
1   bb    580      1         258    2.5     264.875
2   cs    170      1         7083   1.03    2530.4045
3   jd    8        3         84     2.56    32.784
4   ek    7        3         165    1.25    60.6375
5   gtf   220      2         627    3.56    286.384
6   okg   45155    2         784    1.49    13821.5235
7   huf   305      3         85     2.95    122.2925
8   mji   458      4         658    1.05    368.6575
9   loj   201      5         261    0.815   152.77225
10  djjf  48       8         85     0.78    45.867
11  wloe  78       7         12     0.612   29.0918
12  rfm   256      9         235    0.365   160.90475
13  cok   358      10        45     0.164   125.1746

However, my calculation is wrong since I didn't consider higher marks to get lower grades. (data_df["dif"]*.2)

I saw several of them use mean to calculate to weighted average and some use directly like this too. Which is the correct way? Any help would be appriciate.

Corralien · Accepted Answer · 2021-04-26 16:55:36Z

Prepare data:

weights = {"doc": 0.3, "dif": 0.2, "usg": 0.35, "snp": 0.15}

df1 = data_df[weights.keys()]  # extract numeric columns
df2 = pd.DataFrame(weights, index=data_df.index)  # build weighted dataframe

df1 = df1.assign(dif=df1["dif"].max() + 1 - df1["dif"])  # reverse grade

Note: reverse grade according to your post. You can use 1/dif too.

Compute your weighted average:

data_df["Weighted_Average"] = df1.mul(df2).sum(axis="columns")

>>> data_df.sort_values("Weighted_Average", ascending=False)
      id    doc  dif   usg    snp  Weighted_Average
6    okg  45155    2   784  1.490       13822.92350
2     cs    170    1  7083  1.030        2532.20450
8    mji    458    4   658  1.050         369.25750
0    era   1050    1   102  4.200         353.33000
5    gtf    220    2   627  3.560         287.78400
1     bb    580    1   258  2.500         266.67500
12   rfm    256    9   235  0.365         159.50475
9    loj    201    5   261  0.815         152.97225
13   cok    358   10    45  0.164         123.37460
7    huf    305    3    85  2.950         123.29250
4     ek      7    3   165  1.250          61.63750
10  djjf     48    8    85  0.780          44.86700
3     jd      8    3    84  2.560          33.78400
11  wloe     78    7    12  0.612          28.49180

Collectives™ on Stack Overflow

Write way to calculate weighted average with pandas Dataframe with mean or directly multiply

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related