How to create Dataframe based on the two columns with a calculation?

Question

I am trying to create a DataFrame based on the values of two columns of another 2 DataFrames.

I want the new DataFrame to be the share of the value of one of the columns (b) between all the values on the other column (a) based on their weighted contribution to that column (a).

At the moment I have been able to solve it with a loop. But I was wondering If you can think of any vectorized function from pandas that could solve this quicker?

a = pd.DataFrame(np.random.randint(1, 6, 700))
b = pd.DataFrame(np.random.randint(1, 6, 400))*1000

final_share = []

weight = a/a.sum()

for index, value in b.iterrows():
    weighted_value = weight*b.iloc[index].item()
    final_share.append(weighted_value)

final_share = pd.concat(final_share, axis = 1)
final_share.columns = b.index

Josh Friedlander · Accepted Answer · 2019-03-25 11:59:49Z

2

This is elementary with numpy. You want the outer product:

import numpy as np
final_share = pd.DataFrame(np.outer(b, weight).T)

It will be much faster.

Credit to this anom SOer

edited Mar 25, 2019 at 11:59

answered Mar 25, 2019 at 11:51

Josh Friedlander

11.8k7 gold badges42 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2019-03-25 11:55:08Z

0

Use numpy:

arr = a[0].values 
w = arr / arr.sum()

c = pd.DataFrame(w[:, None] * b[0].values)

Or:

c = pd.DataFrame(np.outer(w, b[0].values))

answered Mar 25, 2019 at 11:55

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

How to create Dataframe based on the two columns with a calculation?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related