A basic structured array gives you something that can be indexed with one name:
In [276]: dt=np.dtype([('A',int),('B',int),('C',int)])
In [277]: x=np.arange(9).reshape(3,3).view(dtype=dt)
In [278]: x
Out[278]:
array([[(0, 1, 2)],
[(3, 4, 5)],
[(6, 7, 8)]],
dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
In [279]: x['B'] # index by field name
Out[279]:
array([[1],
[4],
[7]])
In [280]: x[1] # index by row (array element)
Out[280]:
array([(3, 4, 5)],
dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
In [281]: x['B'][1]
Out[281]: array([4])
In [282]: x.shape # could be reshaped to (3,)
Out[282]: (3, 1)
The view approach produced a 2d array, but with just one column. The usual columns are replaced by dtype fields. It's 2d but with a twist. By using view the data buffer is unchanged; the dtype just provides a different way of accessing those 'columns'. dtype fields are, technically, not a dimension. They don't register in either the .shape or .ndim of the array. Also you can't use x[0,'A'].
recarray does the same thing, but adds the option of accessing fields as attributes, e.g. x.B is the same as x['B'].
rows still have to be accessed by index number.
Another way of constructing a structured array is to defined values as a list of tuples.
In [283]: x1 = np.arange(9).reshape(3,3)
In [284]: x2=np.array([tuple(i) for i in x1],dtype=dt)
In [285]: x2
Out[285]:
array([(0, 1, 2), (3, 4, 5), (6, 7, 8)],
dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
In [286]: x2.shape
Out[286]: (3,)
ones, zeros, empty also construct basic structured arrays
In [287]: np.ones((3,),dtype=dt)
Out[287]:
array([(1, 1, 1), (1, 1, 1), (1, 1, 1)],
dtype=[('A', '<i4'), ('B', '<i4'), ('C', '<i4')])
I can construct an array that is indexed with 2 field names, by nesting dtypes:
In [294]: dt1=np.dtype([('D',int),('E',int),('F',int)])
In [295]: dt2=np.dtype([('A',dt1),('B',dt1),('C',dt1)])
In [296]: y=np.ones((),dtype=dt2)
In [297]: y
Out[297]:
array(((1, 1, 1), (1, 1, 1), (1, 1, 1)),
dtype=[('A', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')]), ('B', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')]), ('C', [('D', '<i4'), ('E', '<i4'), ('F', '<i4')])])
In [298]: y['A']['F']
Out[298]: array(1)
But frankly this is rather convoluted. I haven't even figured out how to set the elements to arange(9) (without iterating over field names).
Structured arrays are most commonly produced by reading csv files with np.genfromtxt (or loadtxt). The result is a named field for each labeled column, and a numbered 'row' for each line in the file.
pandaslook promissing. However, I have to pass these data through MPI interfacesnumpywould be a good solution, since they are faster and easier to pass through the interfaces.