3

I am trying to construct a structured array in Python that can be accessed by the names of the columns and rows. Is this possible with the structured array method of numpy?

Example: My array should have roughly this form:

My_array =        A B C 
                E 1 2 3 
                F 4 5 6 
                G 7 8 9 

And i want to have the possibility to do the following:

My_array["A"]["E"] = 1
My_array["C"]["F"] = 6

Is it possible to do this in pyhton using structured arrays or is there another type of structure which is more suitable for such a task?

2
  • 1
    You can use pandas Commented Jul 10, 2015 at 9:42
  • @yangjie tanks, pandas look promissing. However, I have to pass these data through MPI interfaces numpy would be a good solution, since they are faster and easier to pass through the interfaces. Commented Jul 10, 2015 at 10:06

2 Answers 2

2

A basic structured array gives you something that can be indexed with one name:

In [276]: dt=np.dtype([('A',int),('B',int),('C',int)])
In [277]: x=np.arange(9).reshape(3,3).view(dtype=dt)
In [278]: x
Out[278]: 
array([[(0, 1, 2)],
       [(3, 4, 5)],
       [(6, 7, 8)]], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])

In [279]: x['B']   # index by field name
Out[279]: 
array([[1],
       [4],
       [7]])

In [280]: x[1]    # index by row (array element)
Out[280]: 
array([(3, 4, 5)], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])

In [281]: x['B'][1]
Out[281]: array([4])

In [282]: x.shape    # could be reshaped to (3,)
Out[282]: (3, 1)

The view approach produced a 2d array, but with just one column. The usual columns are replaced by dtype fields. It's 2d but with a twist. By using view the data buffer is unchanged; the dtype just provides a different way of accessing those 'columns'. dtype fields are, technically, not a dimension. They don't register in either the .shape or .ndim of the array. Also you can't use x[0,'A'].

recarray does the same thing, but adds the option of accessing fields as attributes, e.g. x.B is the same as x['B'].

rows still have to be accessed by index number.

Another way of constructing a structured array is to defined values as a list of tuples.

In [283]: x1 = np.arange(9).reshape(3,3)
In [284]: x2=np.array([tuple(i) for i in x1],dtype=dt)
In [285]: x2
Out[285]: 
array([(0, 1, 2), (3, 4, 5), (6, 7, 8)], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
In [286]: x2.shape
Out[286]: (3,)

ones, zeros, empty also construct basic structured arrays

In [287]: np.ones((3,),dtype=dt)
Out[287]: 
array([(1, 1, 1), (1, 1, 1), (1, 1, 1)], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])

I can construct an array that is indexed with 2 field names, by nesting dtypes:

In [294]: dt1=np.dtype([('D',int),('E',int),('F',int)])

In [295]: dt2=np.dtype([('A',dt1),('B',dt1),('C',dt1)])

In [296]: y=np.ones((),dtype=dt2)

In [297]: y
Out[297]: 
array(((1, 1, 1), (1, 1, 1), (1, 1, 1)), 
      dtype=[('A', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')]), ('B', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')]), ('C', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')])])

In [298]: y['A']['F']
Out[298]: array(1)

But frankly this is rather convoluted. I haven't even figured out how to set the elements to arange(9) (without iterating over field names).

Structured arrays are most commonly produced by reading csv files with np.genfromtxt (or loadtxt). The result is a named field for each labeled column, and a numbered 'row' for each line in the file.

Sign up to request clarification or add additional context in comments.

3 Comments

can't format this properly in a comment so bear with me. To carry on from your line 298 ...In [299]: z = y.view(type=np.recarray) then In [300]: z.A['F'] or In [3XX]: z.A.F yields array(1) recarrays just add the capability to simply arry['field'] with arry.field notation so z.A.F 'looks' better than the slice equivalent
Yes, my dt2 dtype can be cast as a recarray and accessed with x.A.F. The more complex the dtype the better recarray looks.
@hpaulj Thanks for the discussion of the different possibilities. I was not aware of the possibility to nest dtypes. Maybe, I will tryout your last suggestion with the nested dtypes later.
1

With a recarray, you can access columns with dot notation or with specific reference to the column name. For rows, they are accessed by row number. I haven't seen them accessed via a row name, for example:

>>> import numpy as np
>>> a = np.arange(1,10,1).reshape(3,3)
>>> dt = np.dtype([('A','int'),('B','int'),('C','int')])
>>> a.dtype = dt
>>> r = a.view(type=np.recarray)
>>> r
rec.array([[(1, 2, 3)],
       [(4, 5, 6)],
       [(7, 8, 9)]], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
>>> r.A
array([[1],
       [4],
       [7]])
>>> r['A']
array([[1],
       [4],
       [7]])
>>> r.A[0]
array([1])
>>> a['A'][0]
array([1])
>>> # now for the row
>>> >>> r[0]
rec.array([(1, 2, 3)], 
      dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
>>>

You can specify the dtype and the type at the same time

>>> a = np.ones((3,3))
>>> b = a.view(dtype= [('A','<f8'), ('B','<f8'),('C', '<f8')], type = np.recarray)
>>> b
rec.array([[(1.0, 1.0, 1.0)],
       [(1.0, 1.0, 1.0)],
       [(1.0, 1.0, 1.0)]], 
      dtype=[('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
>>> b.A
array([[ 1.],
       [ 1.],
       [ 1.]])
>>> b.A[0]
array([ 1.])

3 Comments

Thanks for the suggestion. I think I can work around the indexing of the columns. However, if i use np.zeros((3,3)) instead of arange and reshape, I get an array that has two triples in a row: [(0.0, 0.0, 0.0), (0.0, 0.0, 0.0)]. But I just want one triple.
dtype= [('A', np.float), ('B', np.float), ('C', np.float)] solves the problem. Can anyone tell me why?
swot I am adding the row thing to my original post

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.