1

This is my very first question on stackoverflow. So far all my questions had already been asked, but even after much research I couldn't find an answer to this one. So here goes:

I would like to do mathematical operations in numpy arrays for which I casted a dtype. This would be trivial in R but is complicated in python.

import numpy as np
from StringIO import StringIO
test = "a,1,2\nb,3,4"
data = np.genfromtxt(StringIO(test), delimiter=",", dtype=None)

This gives me:

print data
#array([('a', 1, 2), ('b', 3, 4)],
#      dtype=[('f0', '|S1'), ('f1', '<i8'), ('f2', '<i8')])

But then if I try to perform any mathematical operation on the numerical subset of data I get error messages:

subData = data[['f1','f2']]
print subData
# [(1, 2) (3, 4)]
subData+1
#TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'int'

or even:

subData + subData
#TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray'

The only solution I came up with is not a very elegant or practical one because I tend lose the column names and types as well as the original shape:

subData.view(int) + 1

Thanks a lot in advance.

1
  • For what it's worth, numpy's stuctured arrays aren't really meant for this sort of thing. They're arrays of C-like structs, not "spreadsheet-like" data. The typical way to handle it is to hold each column in a separate array. pandas is a much better choice for this, though. It's meant for "spreadsheet-like" data. Commented Feb 9, 2014 at 16:35

1 Answer 1

1

Just to elaborate on my comment, structured arrays aren't exactly meant for this. They're arrays of C-like structs. They can be used to hold columns of different types, but it will become cumbersome quickly. They're very useful for certain things, but "spreadsheet-like" data is not one of them. Typically, you'd just store each column as its own array when they have different types. (This is essentially what pandas does.)

This is because structured arrays aren't arrays where the columns have different types, they're arrays where each item is a sequence that has different types.

If you did want to convert all but the first column into a "normal" 2D array, you'd do something like this:

numeric_data = np.c_[[data[col] for col in data.dtype.names[1:]]]

However, ror data where each column is a different type, it's far better to use pandas. It's meant for spreadsheet-like data.

from StringIO import StringIO
import pandas as pd

test = "a,1,2\nb,3,4"
data = pd.read_csv(StringIO(test), header=None)

print data[[1,2]] + 5
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Joe. I was actually rather trying to avoid having to resort to Pandas. So I prefer the first option :) I still find it weird that I can't do something so simple easily. What are structured arrays meant for if not this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.