0

I have a sample numpy array read from a file like this:

import numpy as np
data = np.array([(b'M', 119.,  76.), (b'M', 114.,  73.), (b'F', 124.,  79.)],
dtype=[('gender', 'S1'), ('bps', '<f8'), ('bpd', '<f8')])

How to get the first column called gender?

I tried:

gender = data[:,0] # fails
gender = [i[0] for i in data] # pass

Is there easier way to get the first column without list comprehension?

Can we unpack 3 columns like g,b,p=*data?

3 Answers 3

2

You made a 1d structured array (with a compound dtype):

In [717]: data = np.array([(b'M', 119.,  76.), (b'M', 114.,  73.), (b'F', 124., 
     ...:  79.)],
     ...: dtype=[('gender', 'S1'), ('bps', '<f8'), ('bpd', '<f8')])
In [718]: 
In [718]: data.shape
Out[718]: (3,)
In [719]: data.dtype.fields
Out[719]: 
mappingproxy({'gender': (dtype('S1'), 0),
              'bps': (dtype('float64'), 1),
              'bpd': (dtype('float64'), 9)})

fields are accessed by name, not a column number:

In [720]: data['gender']
Out[720]: array([b'M', b'M', b'F'], dtype='|S1')

And element of this array is:

In [721]: data[0]
Out[721]: (b'M', 119., 76.)
In [722]: type(_)
Out[722]: numpy.void

We call that a record, but it does display (and index) as a tuple. Note that your creation expression uses a list of tuples.

One way of unpacking the array is to iterate over the field names:

In [734]: data.dtype.names
Out[734]: ('gender', 'bps', 'bpd')
In [735]: a,b,c = [data[name] for name in data.dtype.names]
In [736]: a
Out[736]: array([b'M', b'M', b'F'], dtype='|S1')
In [737]: b
Out[737]: array([119., 114., 124.])
In [738]: c
Out[738]: array([76., 73., 79.])
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the input, Honestly I did not know data.dtype.fields.
This is the complete answer with index error solution and unpacking, SORRY, but Chris was the first to answer and crossed the milestone of 1k, so accepted his answer.
1

First of all, your data doesn't have a column. It is just a 1d array.

data.shape
(3, )

So the slicing will raise IndexError as you passed 2d indices.

Try:

list(zip(*data))[0]
# (b'M', b'M', b'F')

1 Comment

Congrats @Chris for 1k!
1

This is not a ndarray , you have array(tuple,tuple,tuple)

data.shape
(3,)

Convert it to ndarray then you can slice by using the columns and index

np.array(data.tolist())[:,0]
array([b'M', b'M', b'F'], dtype='|S5')

More info :

np.array(data.tolist()).shape
(3, 3)

3 Comments

It has np.array at the beginning, I thought whatever is inside, we get numpy n-darry.
@astro123 see the shape . after convert it become 3,3
Got it, will use data[:,None] which might be simpler than tolist().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.