I want to create a scipy array from a really huge list. But unfortunately I stumbled across a problem.
I have a list xs, of strings. Each string has the length 1.
>>> type(xs)
<type 'list'>
>>> len(xs)
4001844816
If I convert only the first 10 elements, everything still works as expected.
>>> s = xs[0:10]
>>> x = scipy.array(s)
>>> x
array(['A', 'B', 'C', 'D', 'E', 'F', 'O', 'O'],
dtype='|S1‘)
>>> len(x)
10
For the whole list I get this result:
>>> ary = scipy.array(xs)
>>> ary.size
1
>>> ary.shape
()
>>> ary[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: 0-d arrays can't be indexed
>>>ary[()]
...The long list
A workaround would be:
test = scipy.zeros(len(xs), dtype=(str, 1))
for i in xrange(len(xs)):
test[i] = xs[i]
It is not a problem of insufficient memory. So far I will use the workaround (which takes 15 minutes). But I would like to understand the problem.
Thank you
--
Edit:
Remark to workaround test[:] = xs will not work. (Also fails with 0-d IndexError)
On my macbook 2147483648 was the smallest size causing the problem. I determined it with this small script:
#!/usr/bin/python
import scipy as sp
startlen = 2147844816
xs = ["A"] * startlen
ary = sp.array(xs)
while ary.shape == ():
print "bad", len(xs)
xs.pop()
ary = sp.array(xs)
print "good", len(xs)
print ary.shape, ary[0:10]
print "DONE."
This was the output
...
bad 2147483649
bad 2147483648
good 2147483647
(2147483647,) ['A' 'A' 'A' 'A' 'A' 'A' 'A' 'A' 'A' 'A']
DONE.
The python version is
>>> sys.version
'2.7.5 (default, Aug 25 2013, 00:04:04) \n[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]'
>>> scipy.version.version
'0.11.0'
xssubset size causing the error?