0

I'm dealing with arrays in python, and this generated a lot of doubts...

1) I produce a list of list reading 4 columns from N files and I store 4 elements for N times in a list. I then convert this list in a numpy array:

s = np.array(s)

and I ask for the shape of this array. The answer is correct:

print s.shape
#(N,4)

I then produce the mean of this Nx4 array:

s_m = sum(s)/len(s)
print s_m.shape
#(4,)

that I guess it means that this array is a 1D array. Is this correct?

2) If I subtract the mean vector s_m from the rows of the array s, I can proceed in two ways:

residuals_s = s - s_m

or:

residuals_s = []

for i in range(len(s)):
    residuals_s.append([])
    tmp = s[i] - s_m
    residuals_s.append(tmp)

if I now ask for the shape of residuals_s in the two cases I obtain two different answers. In the first case I obtain:

(N,4)

in the second:

(N,1,4)

can someone explain why there is an additional dimension?

2
  • s_m = sum(s)/len(s) should give a float point in normal case, your output looks strange to me... Commented Apr 19, 2014 at 2:35
  • I think tmp has shape (1,4). Therefore an array of N versions of tmp of course has shape (N,1,4). So I think the solution would be to use something like residuals_s.append(tmp.reshape(-1)). Commented Apr 19, 2014 at 3:38

2 Answers 2

1

You can get the mean using the numpy method (producing the same (4,) shape):

s_m = s.mean(axis=0)

s - s_m works because s_m is 'broadcasted' to the dimensions of s.

If I run your second residuals_s I get a list containing empty lists and arrays:

[[],
 array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ]),
 [],
 array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ]),
 ...
]

That does not convert to a (N,1,4) array, but rather a (M,) array with dtype=object. Did you copy and paste correctly?

A corrected iteration is:

for i in range(len(s)):
    residuals_s.append(s[i]-s_m)

produces a simpler list of arrays:

[array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ]),
 array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ]),
...]

which converts to a (N,4) array.

Iteration like this usually is not needed. But if it is, appending to lists like this is one way to go. Another is to pre allocate an array, and assign rows

residuals_s = np.zeros_like(s)
for i in range(s.shape[0]):
    residuals_s[i,:] = s[i]-s_m

I get your (N,1,4) with:

In [39]: residuals_s=[]
In [40]: for i in range(len(s)):
   ....:     residuals_s.append([])
   ....:     tmp = s[i] - s_m
   ....:     residuals_s[-1].append(tmp)
In [41]: residuals_s
Out[41]: 
[[array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ])],
 [array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ])],
...]
In [43]: np.array(residuals_s).shape
Out[43]: (10, 1, 4)

Here the s[i]-s_m array is appended to an empty list, which has been appended to the main list. So it's an array within a list within a list. It's this intermediate list that produces the middle 1 dimension.

Sign up to request clarification or add additional context in comments.

Comments

0

You are using NumPy ndarray without using the functions in NumPy, sum() is a python builtin function, you should use numpy.sum() instead.

I suggest you change your code as:

import numpy as np
np.random.seed(0)
s = np.random.randn(10, 4)
s_m = np.mean(a, axis=0, keepdims=True)
residuals_s = s - s_m

print s.shape, s_m.shape, residuals_s.shape

use mean() function with axis and keepdims arguments will give you the correct result.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.