0

I have a 4D numpy array of input data where each column represents a quantity (say speed, acceleration, etc) and I would like to calculate some statistical information for each quantity (mean, st-dev. meadian, 75, 85 and 95 percentiles.

So for example:

input_shape = (1,200,4)
n_sample = 100

X = np.random.uniform(0,1, (n_sample,) + input_shape)
X.shape
(100, 1, 200, 4)

X[0]
array([[[0.50410922, 0.82829892, 0.72460878, 0.0562701 ],
        [0.49223423, 0.14152948, 0.32285973, 0.49056405],
        ...
        [0.8299407 , 0.78446729, 0.40959698, 0.893117  ],
        [0.25150705, 0.56759064, 0.28280459, 0.0599566 ]]])

Each column of X represents some physical quantity for 200 data-points. The statistics of each quantity is what I'm interested in.

EDIT

I would expect something like:

[[[col1_mean, col2_mean, col3_mean, col4_mean ],
   [col1_std, col2_std, col3_std, col4_mean],
   [col1_med, col2_med, col3_med, col4_med],
   [col1_p75, col2_p75, col3_p75, col4_p75 ],
   [col1_p85, col2_p85, col3_p85, col4_p85 ],
   [col1_p95, col2_p95, col3_p95, col4_p95 ]]]

So the result is shaped (100, 1, 6, 4)

4
  • 1
    your X is a 4D array, It's rather unclear what you mean by std or 75%-percentile of 3D data. Commented Sep 14, 2020 at 16:18
  • @Ah yeah, I should have said 4D instead. Edited. Commented Sep 14, 2020 at 16:20
  • What is the "column" along which you want to compute your statistics? Most stats like this can be computed by passing an axis argument. For example np.mean(X, axis=-2) would return an array of shape (100, 1, 4) with the mean across what you call the "200 data points". Commented Sep 14, 2020 at 17:22
  • @bnaecker Yes, the computation is along axis 3 (axis=-2) as in your comment. Commented Sep 14, 2020 at 17:24

2 Answers 2

2

The easiest thing would be to compute the statistics of interest by supplying an axis argument. This is used by many NumPy functions to run their computation along that axis. For your data, it seems you'd like to compute across the "data points" dimension, which is axis=2. For example:

>>> input_shape = (1,200,4)
>>> n_sample = 100
>>> X = np.random.uniform(0,1, (n_sample,) + input_shape)
>>> X.shape
(100, 1, 200, 4)
>>> X.mean(axis=2).shape  # Compute mean along 3rd axis
(100, 1, 4)
>>> stat_functions = (np.mean, np.std, np.med)
>>> stats = [func(X, axis=2) for func in stat_functions]
>>> list(map(np.shape, stats))
[(100, 1, 4), (100, 1, 4), (100, 1, 4)]

You'll have to do a bit more work to create functions to compute the percentiles you're interested in:

>>> import functools
>>> percentiles = tuple(functools.partial(np.percentile, q=q) for q in (75, 85, 95))
>>> stat_functions = (np.mean, np.std, np.median) + percentiles

If you want to join these into a single array, you can use the keepdims kwarg of each to avoid removing the axis along which the function is applied, and then concatenate the results:

>>> stats = np.concatenate([func(X, axis=2, keepdims=True) for func in stat_functions], axis=2)
>>> stats.shape
(100, 1, 6, 4)
Sign up to request clarification or add additional context in comments.

Comments

0

you can do it with a cicle on indexes, for example if you try this:

print(X[0][0][:,0])

it prints first column so you can iterate it and append it to a list, then calulate median and sdv.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.