0

I have 4 columns of data in a dataframe, where most of the time all 4 columns have data.

In some cases, there are empty cells in the dataframe.

How can I multiply the first 3 occurring values together regardless of their position? I have tried things like df.loc to no avail.

Data below.

{Input Dataframe}
    A        B        C        D
0   (empty)  58       44       62
1   42       34       (empty)  46
2   85       (empty)  81       91
3   54       43       42       56

{Output Dataframe}
0 158224
1 65688
2 626535
3 97524

After multiplying them together, I would like to add a weighting to the scores as well. e.g. first value is 50% of the final score, second value is 35% of final score, third value is 15% of final score. This is would be a sum of the products multiplied by the weighting factor.

Math for row 0 would result in the following: 58x0.5 + 44x0.35 + 62x0.15 = 53.7

{Weighted Sum Result}
0 53.7
1 39.8
2 84.5
3 48.35
2
  • Row 1 should be 42 x 34 x 46 = 65688 no? Commented Jun 14, 2021 at 21:58
  • Yes, it should be. I messed up the data in the copy / paste. Commented Jun 14, 2021 at 22:11

1 Answer 1

1

One option is stack + groupby head + prod:

df.stack().groupby(level=0).head(3).prod(level=0)

Another option is apply on axis=1 with dropna + head + prod:

df.apply(lambda row: row.dropna().head(3).prod(), axis=1)

Both produce:

0    158224.0
1     65688.0
2    626535.0
3     97524.0
dtype: float64

Edit to include ability to do further processing with separate values via pivot_table + dropna:

values = df.stack().reset_index()
values = values.pivot_table(index='level_0',
                            columns=values.groupby('level_0').cumcount(),
                            values=0).dropna(axis=1)

values:

            0     1     2
level_0                  
0        58.0  44.0  62.0
1        42.0  34.0  46.0
2        85.0  81.0  91.0
3        54.0  43.0  42.0

Then further operations can be performed, like weighting:

weights = [.5, .35, .15]
weighted_sum = (values * weights).sum(axis=1)

weighted_sum:

level_0
0    53.70
1    39.80
2    84.50
3    48.35
dtype: float64

DataFrame Used:

import pandas as pd
from numpy import nan

df = pd.DataFrame({'A': [nan, 42.0, 85.0, 54.0], 'B': [58.0, 34.0, nan, 43.0],
                   'C': [44.0, nan, 81.0, 42.0], 'D': [62, 46, 91, 56]})
Sign up to request clarification or add additional context in comments.

1 Comment

Do you know of any alternatives to head? I should have clarified in the original question that I might want to do something like a weighted average to each of the cells. e.g. first value is 50% of the final score, second value is 35% of final score, third value is 15% of final score.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.