3

Can someone find out what is wrong with the code below?

import numpy as np
data = np.recfromcsv("data.txt", delimiter=" ", names=['name', 'types', 'value'])
indices = np.where((data.name == 'david') * data.types.startswith('height'))
mean_value = np.mean(data.value[indices])

I want to calculate mean of weight and height for david and mark as follows:

david>> mean(weight_2005 and weight_2012), mean (height_2005 and height_2012)
mark>> mean(weight_2005 and weight_2012), mean (height_2005 and height_2012)

From the text (data.txt) file:

david weight_2005 50
david weight_2012 60
david height_2005 150
david height_2012 160
mark weight_2005 90
mark weight_2012 85
mark height_2005 160
mark height_2012 170

I am using python 3.2 and numpy 1.8

The above code provides the type error as follows:

TypeError: startswith first arg must be bytes or a tuple of bytes, not numpy.str_
5
  • 1
    The code at the top works for me. mean_value is 155.0, with python 2, numpy 1.7 Commented Nov 13, 2013 at 3:05
  • @askewchan which verson of python and numpy are you using? Commented Nov 13, 2013 at 3:06
  • 1
    I can reproduce the error message in python 3.3 and numpy 1.9.0.dev-8a2728c. Does data.types.astype(str).startswith("height") work? (If so, we should probably figure out what the appropriate idiom to decode is.) Commented Nov 13, 2013 at 3:07
  • @DSM nope, RuntimeWarning: invalid value encountered in double_scalars nan Commented Nov 13, 2013 at 3:11
  • 1
    Well, that makes it clear what the problem is. But that's not the best solution, because we should explicitly decode the bytes into strings and use free functions instead. Maybe there's an option to pass to recfromcsv to do the decoding at that point. Otherwise we should probably call decode manually. In any case, we should probably be using np.char.startswith. Commented Nov 13, 2013 at 3:20

1 Answer 1

1

With Python3.2 and numpy 1.7, this line works

indices = np.where((data.name == b'david') * data.types.startswith(b'height'))

data displays as:

rec.array([(b'david', b'weight_2005', 50),...], 
      dtype=[('name', 'S5'), ('types', 'S11'), ('value', '<i4')])

type(data.name[0]) is <class 'bytes'>.

b'height' works in Python2.7 as well.


another option is to convert all the data to unicode (Python 3 strings)

dtype=[('name','U5'), ('types', 'U11'), ('value', '<i4')]
dataU=data.astype(dtype=dtype)
indices = np.where((dataU.name == 'david') * dataU.types.startswith('height'))

or

data = np.recfromtxt('data.txt', delimiter=" ", 
    names=['name', 'types', 'value'], dtype=dtype)

It looks like recfromcsv does not take a dtype, but recfromtxt does.

Sign up to request clarification or add additional context in comments.

1 Comment

There is a bug report and patch for the fact that recfromcsv does not take dtype: github.com/numpy/numpy/issues/311

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.