Python-numpy test for ndarray using ndim

Question

I'm working on a project in Python requiring a lot of numerical array calculations. Unfortunately (or fortunately, depending on your POV), I'm very new to Python, but have been doing MATLAB and Octave programming (APL before that) for years. I'm very used to having every variable automatically typed to a matrix float, and still getting used to checking input types.

In many of my functions, I require the input S to be a numpy.ndarray of size (n,p), so I have to both test that type(S) is numpy.ndarray and get the values (n,p) = numpy.shape(S). One potential problem is that the input could be a list/tuple/int/etc..., another problem is that the input could be an array of shape (): S.ndim = 0. It occurred to me that I could simultaneously test the variable type, fix the S.ndim = 0problem, then get my dimensions like this:

# first simultaneously test for ndarray and get proper dimensions
try:
    if (S.ndim == 0):
        S = S.copy(); S.shape = (1,1);
    # define dimensions p, and p2
    (p,p2) = numpy.shape(S);
except AttributeError:  # got here because input is not something array-like
    raise AttributeError("blah blah blah");

Though it works, I'm wondering if this is a valid thing to do? The docstring for ndim says

If it is not already an ndarray, a conversion is attempted.

and we surely know that numpy can easily convert an int/tuple/list to an array, so I'm confused why an AttributeError is being raised for these types inputs, when numpy should be doing this

numpy.array(S).ndim;

which should work.

Fred Foo · Accepted Answer · 2012-09-20 14:40:18Z

4

When doing input validation for NumPy code, I always use np.asarray:

>>> np.asarray(np.array([1,2,3]))
array([1, 2, 3])
>>> np.asarray([1,2,3])
array([1, 2, 3])
>>> np.asarray((1,2,3))
array([1, 2, 3])
>>> np.asarray(1)
array(1)
>>> np.asarray(1).shape
()

This function has the nice feature that it only copies data when necessary; if the input is already an ndarray, the data is left in-place (only the type may be changed, because it also gets rid of that pesky np.matrix).

The docstring for ndim says

That's the docstring for the function np.ndim, not the ndim attribute, which non-NumPy objects don't have. You could use that function, but the effect would be that the data might be copied twice, so instead do:

S = np.asarray(S)
(p, p2) = S.shape

This will raise a ValueError if S.ndim != 2.

[Final note: you don't need ; in Python if you just follow the indentation rules. In fact, Python programmers eschew the semicolon.]

edited Sep 20, 2012 at 14:40

answered Sep 20, 2012 at 14:34

Fred Foo

365k80 gold badges765 silver badges852 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Hooked Over a year ago

"Python programmers eschew the semicolon", is not always correct. In fact there is an example in PEP 8 of good style with a semicolon: if x == 4: print x, y; x, y = y, x. Personally though, even in this case I would use two lines. In the OP's case, I completely agree, putting a ; at the end of a normal line of code is unnecessary.

Dr. Andrew Over a year ago

Ok, I understand then the cause of the AttributeError. However, I want to raise an error on a non-array input. So it seems then I only need to add in a 'except ValueError: # got here because S is a vector raise ValueError("blah blah blah");' after the first Except. This way I can test for an array, test for a vector, and force a scalar array to (1,1). Oh, and I know I need to drop the semicolons. Habits are hard ...

Fred Foo Over a year ago

@Hooked: the example in PEP8 is to show proper use of whitespace, not semicolons.

Fred Foo Over a year ago

@user1686236: you can still force an array with shape () to shape (1,1) with .reshape(1,1). np.asarray will already raise a ValueError for you when given the wrong input; consider giving it a dtype argument to ensure your input is numeric. E.g. dtype=np.float.

Hooked Over a year ago

@larsmans If there is an example using semicolons in the official style guide to Python, couldn't one infer that the use of a semicolon isn't explicitly discouraged?

|

Pierre GM · Accepted Answer · 2012-09-20 20:54:51Z

3

Given the comments to @larsmans answer, you could try:

if not isinstance(S, np.ndarray):
    raise TypeError("Input not a ndarray")
if S.ndim == 0:
    S = np.reshape(S, (1,1))
(p, p2) = S.shape

First, you check explicitly whether S is a (subclass of) ndarray. Then, you use the np.reshape to copy your data (and reshaping it, of course) if needed. At last, you get the dimension.

Note that in most cases, the np functions will first try to access the corresponding method of a ndarray, then attempt to convert the input to a ndarray (sometimes keeping it a subclass, as in np.asanyarray, sometimes not (as in np.asarray(...)). In other terms, it's always more efficient to use the method rather than the function: that's why we're using S.shape and not np.shape(S).

Another point: the np.asarray, np.asanyarray, np.atleast_1D... are all particular cases of the more generic function np.array. For example, asarray sets the optional copy argument of array to False, asanyarray does the same and sets subok=True, atleast_1D sets ndmin=1, atleast_2d sets ndmin=2... In other terms, it's always easier to use np.array with the appropriate arguments. But as mentioned in some comments, it's a matter of style. Shortcuts can often improve readability, which is always an objective to keep in mind.

In any case, when you use np.array(..., copy=True), you're explicitly asking for a copy of your initial data, a bit like doing a list([....]). Even if nothing else changed, your data will be copied. That has the advantages of its drawbacks (as we say in French), you could for example change the order from row-first C to column-first F. But anyway, you get the copy you wanted.

With np.array(input, copy=False), a new array is always created. It will either point to the same block of memory as input if this latter was already a ndarray (that is, no waste of memory), or will create a new one "from scratch" if input wasn't. The interesting case is of course if input was a ndarray.

Using this new array in a function may or may not change the original input, depending on the function. You have to check the documentation of the function you want to use to see whether it returns a copy or not. The NumPy developers try hard to limit unnecessary copies (following the Python example), but sometimes it can't be avoided. The documentation should tell explicitly what happens, if it doesn't or it's unclear, please mention it.

np.array(...) may raise some exceptions if something goes awry. For example, trying to use a dtype=float with an input like ["STRING", 1] will raise a ValueError. However, I must admit I can't remember which exceptions in all the cases, please edit this post accordingly.

edited Sep 20, 2012 at 20:54

answered Sep 20, 2012 at 15:13

Pierre GM

20.5k3 gold badges58 silver badges67 bronze badges

2 Comments

Dr. Andrew Over a year ago

Thanks that's extremely useful. I think I'm going to stick with the numpy.array() function with the specific options for duck typing. Three questions if I may: 1) if I set copy = true and no change to the input is required, does python still copy it (and hence waste space)? What if the only thing changed is the shape? 2) How does the copy=False work when the input variable S is passed into a function? **3)**What exception(s) will array() raise?

Bi Rico Over a year ago

"In other terms, it's always easier to use np.array with the appropriate arguments." I think it comes down to style, I prefer array = np.atleast_1d(array) to array = np.array(array, copy=False, ndim=1), though you're right np.atleast_1d calls numpy.array under the hood.

Bi Rico · Accepted Answer · 2012-09-20 15:34:17Z

2

Welcome to stack-overflow. This comes down to almost a style choice, but the most common way I've seen to deal with this kind of situation is to convert the input to an array. Numpy provides some useful tools for this. numpy.asarray has already been mentioned, but here are a few more. numpy.at_least1d is similar to asarray, but reshapes () arrays to be (1,) numpy.at_least2d is the same as above but reshapes 0d and 1d arrays to be 2d, ie (3,) to (1, 3). The reason we convert "array_like" inputs to arrays is partly just because we're lazy, for example sometimes it can be easier to write foo([1, 2, 3]) than foo(numpy.array([1, 2, 3])), but this is also the design choice made within numpy itself. Notice that the following works:

>>> numpy.mean([1., 2., 3.])
>>> 2.0

In the docs for numpy.mean we can see that x should be "array_like".

Parameters
----------
a : array_like
    Array containing numbers whose mean is desired. If `a` is not an
    array, a conversion is attempted.

That being said, there are situations when you want to only accept arrays as arguments and not all "array_like" types.

edited Sep 20, 2012 at 15:34

answered Sep 20, 2012 at 15:24

Bi Rico

25.9k3 gold badges57 silver badges75 bronze badges

1 Comment

Dr. Andrew Over a year ago

Thanks everyone. I've some more related questions about the duck typing with numpy.array, which I'll post on a different thread.

Collectives™ on Stack Overflow

Python-numpy test for ndarray using ndim

3 Answers 3

6 Comments

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related