Shapes of the np.arrays, unexpected additional dimension

Question

I'm dealing with arrays in python, and this generated a lot of doubts...

1) I produce a list of list reading 4 columns from N files and I store 4 elements for N times in a list. I then convert this list in a numpy array:

s = np.array(s)

and I ask for the shape of this array. The answer is correct:

print s.shape
#(N,4)

I then produce the mean of this Nx4 array:

s_m = sum(s)/len(s)
print s_m.shape
#(4,)

that I guess it means that this array is a 1D array. Is this correct?

2) If I subtract the mean vector s_m from the rows of the array s, I can proceed in two ways:

residuals_s = s - s_m

or:

residuals_s = []

for i in range(len(s)):
    residuals_s.append([])
    tmp = s[i] - s_m
    residuals_s.append(tmp)

if I now ask for the shape of residuals_s in the two cases I obtain two different answers. In the first case I obtain:

(N,4)

in the second:

(N,1,4)

can someone explain why there is an additional dimension?

s_m = sum(s)/len(s) should give a float point in normal case, your output looks strange to me... — zhangxaochen
– zhangxaochen, Commented Apr 19, 2014 at 2:35
I think tmp has shape (1,4). Therefore an array of N versions of tmp of course has shape (N,1,4). So I think the solution would be to use something like residuals_s.append(tmp.reshape(-1)). — CliffordVienna
– CliffordVienna, Commented Apr 19, 2014 at 3:38

hpaulj · Accepted Answer · 2014-04-19 19:27:31Z

You can get the mean using the numpy method (producing the same (4,) shape):

s_m = s.mean(axis=0)

s - s_m works because s_m is 'broadcasted' to the dimensions of s.

If I run your second residuals_s I get a list containing empty lists and arrays:

[[],
 array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ]),
 [],
 array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ]),
 ...
]

That does not convert to a (N,1,4) array, but rather a (M,) array with dtype=object. Did you copy and paste correctly?

A corrected iteration is:

for i in range(len(s)):
    residuals_s.append(s[i]-s_m)

produces a simpler list of arrays:

[array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ]),
 array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ]),
...]

which converts to a (N,4) array.

Iteration like this usually is not needed. But if it is, appending to lists like this is one way to go. Another is to pre allocate an array, and assign rows

residuals_s = np.zeros_like(s)
for i in range(s.shape[0]):
    residuals_s[i,:] = s[i]-s_m

I get your (N,1,4) with:

In [39]: residuals_s=[]
In [40]: for i in range(len(s)):
   ....:     residuals_s.append([])
   ....:     tmp = s[i] - s_m
   ....:     residuals_s[-1].append(tmp)
In [41]: residuals_s
Out[41]: 
[[array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ])],
 [array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ])],
...]
In [43]: np.array(residuals_s).shape
Out[43]: (10, 1, 4)

Here the s[i]-s_m array is appended to an empty list, which has been appended to the main list. So it's an array within a list within a list. It's this intermediate list that produces the middle 1 dimension.

HYRY · Accepted Answer · 2014-04-19 07:20:46Z

0

You are using NumPy ndarray without using the functions in NumPy, sum() is a python builtin function, you should use numpy.sum() instead.

I suggest you change your code as:

import numpy as np
np.random.seed(0)
s = np.random.randn(10, 4)
s_m = np.mean(a, axis=0, keepdims=True)
residuals_s = s - s_m

print s.shape, s_m.shape, residuals_s.shape

use mean() function with axis and keepdims arguments will give you the correct result.

answered Apr 19, 2014 at 7:20

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Collectives™ on Stack Overflow

Shapes of the np.arrays, unexpected additional dimension

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related