Calculating some statistics for each column of a numpy ndarray

Question

I have a 4D numpy array of input data where each column represents a quantity (say speed, acceleration, etc) and I would like to calculate some statistical information for each quantity (mean, st-dev. meadian, 75, 85 and 95 percentiles.

So for example:

input_shape = (1,200,4)
n_sample = 100

X = np.random.uniform(0,1, (n_sample,) + input_shape)
X.shape
(100, 1, 200, 4)

X[0]
array([[[0.50410922, 0.82829892, 0.72460878, 0.0562701 ],
        [0.49223423, 0.14152948, 0.32285973, 0.49056405],
        ...
        [0.8299407 , 0.78446729, 0.40959698, 0.893117  ],
        [0.25150705, 0.56759064, 0.28280459, 0.0599566 ]]])

Each column of X represents some physical quantity for 200 data-points. The statistics of each quantity is what I'm interested in.

EDIT

I would expect something like:

[[[col1_mean, col2_mean, col3_mean, col4_mean ],
   [col1_std, col2_std, col3_std, col4_mean],
   [col1_med, col2_med, col3_med, col4_med],
   [col1_p75, col2_p75, col3_p75, col4_p75 ],
   [col1_p85, col2_p85, col3_p85, col4_p85 ],
   [col1_p95, col2_p95, col3_p95, col4_p95 ]]]

So the result is shaped (100, 1, 6, 4)

your X is a 4D array, It's rather unclear what you mean by std or 75%-percentile of 3D data. — Quang Hoang
– Quang Hoang, Commented Sep 14, 2020 at 16:18
What is the "column" along which you want to compute your statistics? Most stats like this can be computed by passing an axis argument. For example np.mean(X, axis=-2) would return an array of shape (100, 1, 4) with the mean across what you call the "200 data points". — bnaecker
– bnaecker, Commented Sep 14, 2020 at 17:22
@bnaecker Yes, the computation is along axis 3 (axis=-2) as in your comment. — arilwan
– arilwan, Commented Sep 14, 2020 at 17:24

bnaecker · Accepted Answer · 2020-09-14 18:26:03Z

The easiest thing would be to compute the statistics of interest by supplying an axis argument. This is used by many NumPy functions to run their computation along that axis. For your data, it seems you'd like to compute across the "data points" dimension, which is axis=2. For example:

>>> input_shape = (1,200,4)
>>> n_sample = 100
>>> X = np.random.uniform(0,1, (n_sample,) + input_shape)
>>> X.shape
(100, 1, 200, 4)
>>> X.mean(axis=2).shape  # Compute mean along 3rd axis
(100, 1, 4)
>>> stat_functions = (np.mean, np.std, np.med)
>>> stats = [func(X, axis=2) for func in stat_functions]
>>> list(map(np.shape, stats))
[(100, 1, 4), (100, 1, 4), (100, 1, 4)]

You'll have to do a bit more work to create functions to compute the percentiles you're interested in:

>>> import functools
>>> percentiles = tuple(functools.partial(np.percentile, q=q) for q in (75, 85, 95))
>>> stat_functions = (np.mean, np.std, np.median) + percentiles

If you want to join these into a single array, you can use the keepdims kwarg of each to avoid removing the axis along which the function is applied, and then concatenate the results:

>>> stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)
>>> stats.shape
(100, 1, 6, 4)

Carlos Espinoza Garcia · Accepted Answer · 2020-09-14 17:11:49Z

0

you can do it with a cicle on indexes, for example if you try this:

print(X[0][0][:,0])

it prints first column so you can iterate it and append it to a list, then calulate median and sdv.

answered Sep 14, 2020 at 17:11

Carlos Espinoza Garcia

871 silver badge12 bronze badges

Collectives™ on Stack Overflow

Calculating some statistics for each column of a numpy ndarray

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related