1

I am trying to create a DataFrame based on the values of two columns of another 2 DataFrames.

I want the new DataFrame to be the share of the value of one of the columns (b) between all the values on the other column (a) based on their weighted contribution to that column (a).

At the moment I have been able to solve it with a loop. But I was wondering If you can think of any vectorized function from pandas that could solve this quicker?

a = pd.DataFrame(np.random.randint(1, 6, 700))
b = pd.DataFrame(np.random.randint(1, 6, 400))*1000

final_share = []

weight = a/a.sum()

for index, value in b.iterrows():
    weighted_value = weight*b.iloc[index].item()
    final_share.append(weighted_value)

final_share = pd.concat(final_share, axis = 1)
final_share.columns = b.index

2 Answers 2

2

This is elementary with numpy. You want the outer product:

import numpy as np
final_share = pd.DataFrame(np.outer(b, weight).T)

It will be much faster.

Credit to this anom SOer

Sign up to request clarification or add additional context in comments.

Comments

0

Use numpy:

arr = a[0].values 
w = arr / arr.sum()

c = pd.DataFrame(w[:, None] * b[0].values)

Or:

c = pd.DataFrame(np.outer(w, b[0].values))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.