0

Part of a program I'm writing has code that calculates the following:

data = np.array(..........)
param = np.array(range(100)+1)
result = np.array([data[-x:].mean() for x in param])

This code is used in a giant loop so performance is crucial. It shows that the 3rd line (result = ...) takes the most time of all - I wonder if there are better ways to do this operation?

Any suggestions are appreciated!

3
  • np.array(range(100)+1) this isn't valid? Do you mean np.arange(100)+1? Also, are you really collecting the averages of the array backwards (last sample, last 2 samples, last 3 samples, etc.) Commented Dec 29, 2020 at 3:36
  • Removing data-science tag as it's not relevant here. Commented Dec 29, 2020 at 4:37
  • For the first part indeed was a typo, but yes I will be collecting backward averages.. thanks! Commented Dec 29, 2020 at 11:35

2 Answers 2

3

If you add 0 to the beginning of the array, and then create its cumulative sum using np.cumsum, then finding the average between indices i and j and just (my_cumsum[j] - my_cumsum[i]) / (j - i).

This should let you vastly simplify your code.

Sign up to request clarification or add additional context in comments.

Comments

2

I think you are looking for this:

data[::-1].cumsum()[:100]/np.arange(1,101)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.