Numpy dot product with 3d array

Question

I've got two arrays:

data of shape (2466, 2498, 9), where the dimensions are (asset, date, returns).
correlation_matrix of shape (2466, 2466) (with 0's on the diagonal)

I want to get the dot product that equates to the expected returns, which is the returns of each asset multiplied by the correlation_matrix. It should give a shape the same as data.

I've tried:

data.transpose([1, 2, 0]) @ correlation_matrix

but this just hangs my PC (been going 10 minutes and counting).

I also tried:

np.einsum('ijk,lm->ijk', data, correlation_matrix)

but I'm less familiar with einsum, and this also hangs.

What am I doing wrong?

Think you can just do data*correlation_matrix.sum(), assuming your einsum works. — Divakar
– Divakar, Commented Jun 25, 2020 at 18:34
If things are hanging or taking too long, step back and test something smaller. Make sure the code is doing what you want with small arrays before stressing memory with something large. — hpaulj
– hpaulj, Commented Jun 25, 2020 at 18:39
Your einsum just sums all values of correlation_matrix and multiplies data by the resulting scalar. That probably not what you want. — hpaulj
– hpaulj, Commented Jun 25, 2020 at 22:16
Have you looked at your task manager or htop to see if your computer has enough RAM to do the operation without SWAPing to secondary memory (hard drive)? — hobs
– hobs, Commented Apr 27, 2024 at 20:57

Mateen Ulhaq · Accepted Answer · 2020-06-25 20:19:59Z

3

With your .transpose((1, 2, 0)) data, the correct form is:

"ijs,sk"  # -> ijk

Since for a tensor A and B, we can write:

C_{ijk} = Σ_s A_{ijs} * B_{sk}

If you want to avoid transposing your data beforehand, you can just permute the indices:

"sij,sk"  # -> ijk

To verify:

p, q, r = 2466, 2498, 9

a = np.random.randint(255, size=(p, q, r))
b = np.random.randint(255, size=(p, p))

c1 = a.transpose((1, 2, 0)) @ b
c2 = np.einsum("sij,sk", a, b)

>>> np.all(c1 == c2)
True

The amount of multiplications needed to compute this for (p, q, r) shaped data is p * np.prod(c.shape) == p * (q * r * p) == p**2 * q * r. In your case, that is 136_716_549_192 multiplications. You also need approximately the same number of additions, so that gives us somewhere close to 270 billion operations. If you want more speed, you could consider using a GPU for your computations via cupy.

def with_np():
    p, q, r = 2466, 2498, 9
    a = np.random.randint(255, size=(p, q, r))
    b = np.random.randint(255, size=(p, p))
    c1 = a.transpose((1, 2, 0)) @ b
    c2 = np.einsum("sij,sk", a, b)

def with_cp():
    p, q, r = 2466, 2498, 9
    a = cp.random.randint(255, size=(p, q, r))
    b = cp.random.randint(255, size=(p, p))
    c1 = a.transpose((1, 2, 0)) @ b
    c2 = cp.einsum("sij,sk", a, b)

>>> timeit(with_np, number=1)
513.066

>>> timeit(with_cp, number=1)
0.197

That's a speedup of 2600, including memory allocation, initialization, and CPU/GPU copy times! (A more realistic benchmark would give an even larger speedup.)

edited Jun 25, 2020 at 20:19

answered Jun 25, 2020 at 18:36

Mateen Ulhaq

27.9k21 gold badges122 silver badges155 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

cjm2671 Over a year ago

Wow, that's an incredible speedup! What kind of GPU do you have? That might be worth investment ;)

Mateen Ulhaq Over a year ago

@cjm2671 It's a bit "outdated" -- NVIDIA GTX 1060 6GB ($300 when purchased, 5 years ago). You can probably get a newer graphics card model for cheaper than this card is being sold for nowadays, though. If you want CUDA computations, I recommend sticking with NVIDIA.

cjm2671 Over a year ago

Well, looks like they're around $30 now on ebay, seems like a worthy investment ! Thank you! :)

Feodoran · Accepted Answer · 2020-06-25 18:55:24Z

There are different ways to do this product:

# as you already suggested:
data.transpose([1, 2, 0]) @ correlation_matrix

# using einsum
np.einsum('ijk,il', data, correlation_matrix)

# using tensordot to explicitly specify the axes to sum over
np.tensordot(data, correlation_matrix, axes=(0,0))

All of them should give the same result. The timing for some small matrices was more or less the same for me. So your problem is the large amount of data, not an inefficient implementation.

A=np.arange(100*120*9).reshape((100, 120, 9))
B=np.arange(100**2).reshape((100,100))

timeit('A.transpose([1,2,0])@B', globals=globals(), number=100)
# 0.747475513999234
timeit("np.einsum('ijk,il', A, B)", globals=globals(), number=100)
# 0.4993825999990804
timeit('np.tensordot(A, B, axes=(0,0))', globals=globals(), number=100)
# 0.5872082839996438

Collectives™ on Stack Overflow

Numpy dot product with 3d array

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related