2

I'm trying to compute a moving average but with a set step size between each average. For example, if I was computing the average of a 4 element window every 2 elements:

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

This should produce the average of [1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8], [7, 8, 9, 10].

window_avg = [2.5, 4.5, 6.5, 8.5]

My data is such that the ending will be truncated before processing so there is no problem with the length with respect to window size.

I've read a bit about how to do moving averages in Python and there seems to be a lot of usage of itertools; however, the iterators go one element at a time and I can't figure out how to have a step size between each calculation of the average. (How to calculate moving average in Python 3?)

I have also been able to do this before in MATLAB by creating a matrix of indices which are overlapping and then indexing the data vector and performing a column wise mean (Create matrix by repeatedly overlapping a vector). However, since this vector is rather large (~70 000 elements, window of 450 samples, average every 30 samples), the computation would probably require too much memory.

Any help would be greatly appreciated. I am using Python 2.7.

1
  • I would try something along the lines of n=4; s==2; [sum(data[s*i:s*i+n])/n for i, datum in enumerate(data[::s])], but perhaps that's not what you looking for (datum here is unnecessary, but range(len(data)) just looks so unPythonic). Commented Jan 13, 2014 at 17:15

3 Answers 3

3

One way to compute the average of a sliding window across a list in Python is to use a list comprehension. You can use

>>> range(0, len(data), 2)
[0, 2, 4, 6, 8]

to get the starting indices of each window, and then numpy's mean function to take the average of each window. See the demo below:

>>> import numpy as np
>>> window_size = 4
>>> stride = 2
>>> window_avg = [ np.mean(data[i:i+window_size]) for i in range(0, len(data), stride)
                   if i+window_size <= len(data) ]
>>> window_avg
[2.5, 4.5, 6.5, 8.5]

Note that the list comprehension does have a condition to ensure that it only computes the average of "full windows", or sublists with exactly window_size elements.

When run on a dataset of the size discussed in the OP, this method computes on my MBA in a little over 200 ms:

In [5]: window_size = 450
In [6]: data = range(70000)
In [7]: stride = 30
In [8]: timeit [ np.mean(data[i:i+window_size]) for i in range(0, len(data), stride)
                 if i+window_size <= len(data) ]
1 loops, best of 3: 220 ms per loop

It is about twice as fast on my machine to the itertools approach presented by @Abhijit:

In [9]: timeit map(np.mean, izip(*(islice(it, i, None, stride) for i, it in enumerate(tee(data, window_size)))))
1 loops, best of 3: 436 ms per loop
Sign up to request clarification or add additional context in comments.

5 Comments

what's the efficiency for mass data?
@zhangxaochen: I added a speed benchmark for the dataset mentioned in the OP.
seems Abhijit's solution is much faster: 6.19 ms per loop
@zhangxaochen: interesting, I tried benchmarking his approach, too, and found the opposite. Can you verify what you got looking at my updated answer?
@zhangxaochen: I believe, there might be something wrong with your bench-marking, mdml's approach is considerably faster, per my benchmarking
1

The following approach uses itertools at its fullest to create moving average window of size 4. As then entire expression is a generator which is evaluated when calculating the average, it has a complexity of O(n).

>>> import numpy as np
>>> from itertools import count, tee, izip, islice
>>> map(np.mean, izip(*(islice(it,i,None,2)
                      for i, it in enumerate(tee(data, 4)))))
[2.5, 4.5, 6.5, 8.5]

Its interesting to note, how individual itertools function works in accord.

  1. itertools.tee n-plicates an iterator, in this case 4 times
  2. enumerate creates an enumerator object which yield a tuple of index and element (which is the iterator)
  3. slice the iterator with stride 2, starting from the index position.

1 Comment

Thanks for your help! This is more along the lines of what I expected with an itertools implementation and is very informative. I'm surprised that this is actually slower than the other answer since my understanding was that itertools is faster than using a list comprehension in many cases. That being said, I am selecting mdml's answer since it's faster and more readable.
0

You can use rolling function of Pandas DataFrame,

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
df = pd.DataFrame(data)
>>> 
    0
0   1
1   2
2   3
3   4
4   5
5   6
6   7
7   8
8   9
9  10

Using Pandas DataFrame's rolling function,

df.rolling(4).mean().dropna()[::2]
>>> 
     0
3  2.5
5  4.5
7  6.5
9  8.5

4 is the window size and 2 in [::2] can be assumed to be step size. Actually, df.rolling(4).mean().dropna() shift the window 1-by-1 and by applying index [::2], we pick one after taking two steps.

Alternatively, If you have Pandas version > 1.5, you can give step size. Note that, center argument must be 'True'. The solution:

df.rolling(4, step=2, center=True).mean().dropna()

>>> df
     0
2  2.5
4  4.5
6  6.5
8  8.5

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.