0

Consider the following code:

data = np.loadtxt("submission_zca_.csv", dtype=np.float32, delimiter=',', skiprows=1)
print "Encoded Data :", data
print "shape", data.shape


for x in xrange(0, data.shape[0]):
    #decode_to_string returns "a","b","c",....
    value = decode_to_string(data[x, 1])
    print "Value", value
    data[x,1]=value

This produces the following error:

ValueError: could not convert string to float: G

How do I replace the second column of this CSV by char/string using NumPy?

1 Answer 1

1

NumPy arrays have a fixed dtype. So if you use data = np.loadtxt(..., dtype='np.float32') then data will have a float32 dtype and you will not be able to assign string values to data. For example,

In [14]: data = np.random.random((2,3))

In [15]: data[0,1] = 'G'
ValueError: could not convert string to float: G

Here are some options:

  1. Use np.genfromtxt and specify a converter function:

    data = np.genfromtxt("submission_zca_.csv", dtype=None, delimiter=',', 
                      skiprows=1, converters={1: decode_to_string})
    

    This will build a structured array. For example, if the CSV contains

    A,B,C
    1,2,3
    4,5,6
    

    then

    def decode_to_string(x):
        return 'foo'*int(x)
    data = np.genfromtxt("submission_zca_.csv", dtype=None, delimiter=',', 
                         skiprows=1, converters={1: decode_to_string})
    

    yields

    array([(1, 'foofoo', 3), (4, 'foofoofoofoofoo', 6)], 
          dtype=[('f0', '<i8'), ('f1', 'S15'), ('f2', '<i8')])
    
  2. Simply use a list of lists:

    import csv
    data = []
    with open("submission_zca_.csv", 'rb') as f:
        for row in csv.reader(f, delimiter=','):
            row = list(map(float, row))
            row[1] = decode_to_string(row[1])
            data.append(row)
    
  3. Or, you could create a second structured array or array of object dtype after using np.loadtxt, copy values from data into it, and then write the string values into the new array too. But I don't see any advantage to doing it this way over using option 1 or 2.

  4. It is possible to view the array with a different dtype. In fact, you could change an array of dtype float32 into a structured array with one (or more) columns with dtype S4 (4-byte strings). But the string column(s) would have to be exactly 4-bytes in size since float32 occupies 4 bytes and the size of the array can not be changed without resizing (and therefore copying) the array. Thus, this option would be of limited value unless you can guarantee that decode_to_string returns 4-byte strings.

    In [35]: data = np.genfromtxt("submission_zca_.csv", dtype=np.float32, delimiter=',', skiprows=1)
    
    In [36]: data
    Out[36]: 
    array([[ 1.,  2.,  3.],
           [ 4.,  5.,  6.]], dtype=float32)
    
    In [37]: data.dtype
    Out[37]: dtype('float32')
    
    In [51]: data = data.ravel().view([('f0', 'float32'), ('f1', 'S4'), ('f2', 'float32')])
    
    In [52]: data
    Out[52]: 
    array([(1.0, '\x00\x00\x00@', 3.0), (4.0, '\x00\x00\xa0@', 6.0)], 
          dtype=[('f0', '<f4'), ('f1', 'S4'), ('f2', '<f4')])
    

You could then assign different 4-byte values to the second column this way:

    In [53]: data['f1'] = ['fooo', 'barr']

    In [55]: data
    Out[55]: 
    array([(1.0, 'fooo', 3.0), (4.0, 'barr', 6.0)], 
          dtype=[('f0', '<f4'), ('f1', 'S4'), ('f2', '<f4')])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.