1

I know similar questions on this topic have been asked before, but I'm still struggling to make any headway with my problem.

Basically, I have three dataframes (of sizes 402 x 402, 402 x 3142, and 1 x 402) and I'm combining elements from them into a calculation. I then write the calculation to another dataframe - see code below using dummy data. Each calculation takes between 0.3-0.8 ms, but there are (402 x 3142)^2 total calculations, which obviously takes a long time!

Since none of the calculations is dependent on any other, this is ripe for parallelization, but I'm really having a hard time figuring out how to do this - sorry the code is probably pretty ugly, very new to python, and parallel computing.

One additional thing to note is that the non-vector matrices are sparse (0.4 and 0.3, respectively), so could be changed to coordinate or compressed row/column format so that not all of the possible combinations of calculations need to be made. This might reduce the time by half.

import pandas as pd

A = pd.DataFrame(np.random.choice([0, 1], size=(402,402), p=[0.6,0.4]))
B = pd.DataFrame(np.random.choice([0, 1], size=(402,3142), p=[0.7,0.3]))
x = A.sum(axis = 1)

col_names = ["R", "I", "S", "J","value"]
results = pd.DataFrame(columns = col_names)

row = 0
for r in B.columns:
    for s in B.columns:
        for i in A.index:
            for j in A.columns:
                results.loc[row,"R"] = r
                results.loc[row,"I"] = i
                results.loc[row,"S"] = s
                results.loc[row,"J"] = j
                results.loc[row, "value"] = A.loc[i,j]*B.loc[j,s]*B.loc[i,r]/x[i]
                row = row + 1


0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.