Multiply columns of dataframe together if some data missing

Question

I have 4 columns of data in a dataframe, where most of the time all 4 columns have data.

In some cases, there are empty cells in the dataframe.

How can I multiply the first 3 occurring values together regardless of their position? I have tried things like df.loc to no avail.

Data below.

{Input Dataframe}
    A        B        C        D
0   (empty)  58       44       62
1   42       34       (empty)  46
2   85       (empty)  81       91
3   54       43       42       56

{Output Dataframe}
0 158224
1 65688
2 626535
3 97524

After multiplying them together, I would like to add a weighting to the scores as well. e.g. first value is 50% of the final score, second value is 35% of final score, third value is 15% of final score. This is would be a sum of the products multiplied by the weighting factor.

Math for row 0 would result in the following: 58x0.5 + 44x0.35 + 62x0.15 = 53.7

{Weighted Sum Result}
0 53.7
1 39.8
2 84.5
3 48.35

Yes, it should be. I messed up the data in the copy / paste. — DomCobb
– DomCobb, Commented Jun 14, 2021 at 22:11

Henry Ecker · Accepted Answer · 2021-06-14 22:19:12Z

1

One option is stack + groupby head + prod:

df.stack().groupby(level=0).head(3).prod(level=0)

Another option is apply on axis=1 with dropna + head + prod:

df.apply(lambda row: row.dropna().head(3).prod(), axis=1)

Both produce:

0    158224.0
1     65688.0
2    626535.0
3     97524.0
dtype: float64

Edit to include ability to do further processing with separate values via pivot_table + dropna:

values = df.stack().reset_index()
values = values.pivot_table(index='level_0',
                            columns=values.groupby('level_0').cumcount(),
                            values=0).dropna(axis=1)

values:

            0     1     2
level_0                  
0        58.0  44.0  62.0
1        42.0  34.0  46.0
2        85.0  81.0  91.0
3        54.0  43.0  42.0

Then further operations can be performed, like weighting:

weights = [.5, .35, .15]
weighted_sum = (values * weights).sum(axis=1)

weighted_sum:

level_0
0    53.70
1    39.80
2    84.50
3    48.35
dtype: float64

DataFrame Used:

import pandas as pd
from numpy import nan

df = pd.DataFrame({'A': [nan, 42.0, 85.0, 54.0], 'B': [58.0, 34.0, nan, 43.0],
                   'C': [44.0, nan, 81.0, 42.0], 'D': [62, 46, 91, 56]})

edited Jun 14, 2021 at 22:19

answered Jun 14, 2021 at 21:53

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

DomCobb Over a year ago

Do you know of any alternatives to head? I should have clarified in the original question that I might want to do something like a weighted average to each of the cells. e.g. first value is 50% of the final score, second value is 35% of final score, third value is 15% of final score.

Collectives™ on Stack Overflow

Multiply columns of dataframe together if some data missing

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related