Replacing 1 column of CSV from float to char with Python/NumPy

Question

Consider the following code:

data = np.loadtxt("submission_zca_.csv", dtype=np.float32, delimiter=',', skiprows=1)
print "Encoded Data :", data
print "shape", data.shape


for x in xrange(0, data.shape[0]):
    #decode_to_string returns "a","b","c",....
    value = decode_to_string(data[x, 1])
    print "Value", value
    data[x,1]=value

This produces the following error:

ValueError: could not convert string to float: G

How do I replace the second column of this CSV by char/string using NumPy?

unutbu · Accepted Answer · 2015-11-12 18:26:30Z

NumPy arrays have a fixed dtype. So if you use data = np.loadtxt(..., dtype='np.float32') then data will have a float32 dtype and you will not be able to assign string values to data. For example,

In [14]: data = np.random.random((2,3))

In [15]: data[0,1] = 'G'
ValueError: could not convert string to float: G

Here are some options:

Use np.genfromtxt and specify a converter function:

data = np.genfromtxt("submission_zca_.csv", dtype=None, delimiter=',', 
                  skiprows=1, converters={1: decode_to_string})

This will build a structured array. For example, if the CSV contains

A,B,C
1,2,3
4,5,6

then

def decode_to_string(x):
    return 'foo'*int(x)
data = np.genfromtxt("submission_zca_.csv", dtype=None, delimiter=',', 
                     skiprows=1, converters={1: decode_to_string})

yields

array([(1, 'foofoo', 3), (4, 'foofoofoofoofoo', 6)], 
      dtype=[('f0', '<i8'), ('f1', 'S15'), ('f2', '<i8')])

Simply use a list of lists:

import csv
data = []
with open("submission_zca_.csv", 'rb') as f:
    for row in csv.reader(f, delimiter=','):
        row = list(map(float, row))
        row[1] = decode_to_string(row[1])
        data.append(row)

Or, you could create a second structured array or array of object dtype after using np.loadtxt, copy values from data into it, and then write the string values into the new array too. But I don't see any advantage to doing it this way over using option 1 or 2.
It is possible to view the array with a different dtype. In fact, you could change an array of dtype float32 into a structured array with one (or more) columns with dtype S4 (4-byte strings). But the string column(s) would have to be exactly 4-bytes in size since float32 occupies 4 bytes and the size of the array can not be changed without resizing (and therefore copying) the array. Thus, this option would be of limited value unless you can guarantee that decode_to_string returns 4-byte strings.
```
In [35]: data = np.genfromtxt("submission_zca_.csv", dtype=np.float32, delimiter=',', skiprows=1)

In [36]: data
Out[36]: 
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]], dtype=float32)

In [37]: data.dtype
Out[37]: dtype('float32')

In [51]: data = data.ravel().view([('f0', 'float32'), ('f1', 'S4'), ('f2', 'float32')])

In [52]: data
Out[52]: 
array([(1.0, '\x00\x00\x00@', 3.0), (4.0, '\x00\x00\xa0@', 6.0)], 
      dtype=[('f0', '<f4'), ('f1', 'S4'), ('f2', '<f4')])
```

You could then assign different 4-byte values to the second column this way:

    In [53]: data['f1'] = ['fooo', 'barr']

    In [55]: data
    Out[55]: 
    array([(1.0, 'fooo', 3.0), (4.0, 'barr', 6.0)], 
          dtype=[('f0', '<f4'), ('f1', 'S4'), ('f2', '<f4')])

Collectives™ on Stack Overflow

Replacing 1 column of CSV from float to char with Python/NumPy

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related