getting indices in numpy

Question

Can someone find out what is wrong with the code below?

import numpy as np
data = np.recfromcsv("data.txt", delimiter=" ", names=['name', 'types', 'value'])
indices = np.where((data.name == 'david') * data.types.startswith('height'))
mean_value = np.mean(data.value[indices])

I want to calculate mean of weight and height for david and mark as follows:

david>> mean(weight_2005 and weight_2012), mean (height_2005 and height_2012)
mark>> mean(weight_2005 and weight_2012), mean (height_2005 and height_2012)

From the text (data.txt) file:

david weight_2005 50
david weight_2012 60
david height_2005 150
david height_2012 160
mark weight_2005 90
mark weight_2012 85
mark height_2005 160
mark height_2012 170

I am using python 3.2 and numpy 1.8

The above code provides the type error as follows:

TypeError: startswith first arg must be bytes or a tuple of bytes, not numpy.str_

The code at the top works for me. mean_value is 155.0, with python 2, numpy 1.7 — askewchan
– askewchan, Commented Nov 13, 2013 at 3:05
I can reproduce the error message in python 3.3 and numpy 1.9.0.dev-8a2728c. Does data.types.astype(str).startswith("height") work? (If so, we should probably figure out what the appropriate idiom to decode is.) — DSM
– DSM, Commented Nov 13, 2013 at 3:07
@DSM nope, RuntimeWarning: invalid value encountered in double_scalars nan — 2964502
– 2964502, Commented Nov 13, 2013 at 3:11
Well, that makes it clear what the problem is. But that's not the best solution, because we should explicitly decode the bytes into strings and use free functions instead. Maybe there's an option to pass to recfromcsv to do the decoding at that point. Otherwise we should probably call decode manually. In any case, we should probably be using np.char.startswith. — DSM
– DSM, Commented Nov 13, 2013 at 3:20

hpaulj · Accepted Answer · 2013-11-24 08:22:23Z

1

With Python3.2 and numpy 1.7, this line works

indices = np.where((data.name == b'david') * data.types.startswith(b'height'))

data displays as:

rec.array([(b'david', b'weight_2005', 50),...], 
      dtype=[('name', 'S5'), ('types', 'S11'), ('value', '<i4')])

type(data.name[0]) is <class 'bytes'>.

b'height' works in Python2.7 as well.

another option is to convert all the data to unicode (Python 3 strings)

dtype=[('name','U5'), ('types', 'U11'), ('value', '<i4')]
dataU=data.astype(dtype=dtype)
indices = np.where((dataU.name == 'david') * dataU.types.startswith('height'))

or

data = np.recfromtxt('data.txt', delimiter=" ", 
    names=['name', 'types', 'value'], dtype=dtype)

It looks like recfromcsv does not take a dtype, but recfromtxt does.

edited Nov 24, 2013 at 8:22

answered Nov 24, 2013 at 7:23

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hpaulj Over a year ago

There is a bug report and patch for the fact that recfromcsv does not take dtype: github.com/numpy/numpy/issues/311

Collectives™ on Stack Overflow

getting indices in numpy

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related