I am using scipy.sparse in my application and want to do some performance tests. In order to do that, I need to create a large sparse matrix (which I will then use in my application). As long as the matrix is small, I can create it using the command
import scipy.sparse as sp
a = sp.rand(1000,1000,0.01)
Which results in a 1000 by 1000 matrix with 10.000 nonzero entries (a reasonable density meaning approximately 10 nonzero entries per row)
The problem is when I try to create a larger matrix, for example, a 100.000 by 100.000 matrix (I have dealt with way larger matrices before), I run
import scipy.sparse as sp
N = 100000
d = 0.0001
a = sp.rand(N, N, d)
which should result in a 100.000 by 100.000 matrix with one million nonzero entries (way in the realm of possible), I get an error message:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
sp.rand(100000,100000,0.0000001)
File "C:\Python27\lib\site-packages\scipy\sparse\construct.py", line 723, in rand
j = random_state.randint(mn)
File "mtrand.pyx", line 935, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:10327)
OverflowError: Python int too large to convert to C long
Which is some annoying internal scipy error I cannot remove.
I understand that I can create a 10*n by 10*n matrix by creating one hundred n by n matrices, then stacking them together, however, I think that scipy.sparse should be able to handle the creation of large sparse matrices (I say again, 100k by 100k is by no means large, and scipy is more than comfortable handling matrices with several million rows). Am I missing something?
32 bit intbetween 0 andN*M, and the max 32 bit (signed) int is2^31-1(100,000*100,000 = 10,000,000,000 > 2,147,483,647 = 2^31-1). Building it in blocks usingbmatis probably the easiest work around. Try makingN*M = 2^31-2and then2^31and see if that causes the problem to pop up.Python int too large to convert to C longand the limits in the climits header.stdint.hon your system though and see what your limits are.