NumPy arrays have a fixed dtype.
So if you use data = np.loadtxt(..., dtype='np.float32') then data will have a float32 dtype and you will not be able to assign string values to data. For example,
In [14]: data = np.random.random((2,3))
In [15]: data[0,1] = 'G'
ValueError: could not convert string to float: G
Here are some options:
Use np.genfromtxt and specify a converter function:
data = np.genfromtxt("submission_zca_.csv", dtype=None, delimiter=',',
skiprows=1, converters={1: decode_to_string})
This will build a structured array. For example, if the CSV contains
A,B,C
1,2,3
4,5,6
then
def decode_to_string(x):
return 'foo'*int(x)
data = np.genfromtxt("submission_zca_.csv", dtype=None, delimiter=',',
skiprows=1, converters={1: decode_to_string})
yields
array([(1, 'foofoo', 3), (4, 'foofoofoofoofoo', 6)],
dtype=[('f0', '<i8'), ('f1', 'S15'), ('f2', '<i8')])
Simply use a list of lists:
import csv
data = []
with open("submission_zca_.csv", 'rb') as f:
for row in csv.reader(f, delimiter=','):
row = list(map(float, row))
row[1] = decode_to_string(row[1])
data.append(row)
Or, you could create a second structured array or array of object dtype after using np.loadtxt, copy values from data into it, and then write the string values into the new array too. But I don't see any advantage to doing it this way over using option 1 or 2.
It is possible to view the array with a different dtype. In fact, you could
change an array of dtype float32 into a structured array with one (or more)
columns with dtype S4 (4-byte strings). But the string column(s) would have to
be exactly 4-bytes in size since float32 occupies 4 bytes and the size of the
array can not be changed without resizing (and therefore copying) the array.
Thus, this option would be of limited value unless you can guarantee that decode_to_string returns 4-byte strings.
In [35]: data = np.genfromtxt("submission_zca_.csv", dtype=np.float32, delimiter=',', skiprows=1)
In [36]: data
Out[36]:
array([[ 1., 2., 3.],
[ 4., 5., 6.]], dtype=float32)
In [37]: data.dtype
Out[37]: dtype('float32')
In [51]: data = data.ravel().view([('f0', 'float32'), ('f1', 'S4'), ('f2', 'float32')])
In [52]: data
Out[52]:
array([(1.0, '\x00\x00\x00@', 3.0), (4.0, '\x00\x00\xa0@', 6.0)],
dtype=[('f0', '<f4'), ('f1', 'S4'), ('f2', '<f4')])
You could then assign different 4-byte values to the second column this way:
In [53]: data['f1'] = ['fooo', 'barr']
In [55]: data
Out[55]:
array([(1.0, 'fooo', 3.0), (4.0, 'barr', 6.0)],
dtype=[('f0', '<f4'), ('f1', 'S4'), ('f2', '<f4')])