1

This is a question for all the really clever users of advanced Python out there. My input is a series of numbers in a Python numpy array. They are floating point values arranged in a linear sequence. I need to create a new geometric sequence where:

  • the new first value = the original first value

  • the new second value = the sum of original values two and three

  • the new third value = the sum of original values four to seven

  • the new fourth value = the sum of values eight to fifteen

I can of course do this by looping through the data but I need this processing to be done as quickly as possible - the arrays are huge. What is the fastest way to do this?

example:

12.0, 3.4, 9.2, 7.7, 4.9, 3.8, 6.9

should become:

12.0, 12.6, 23.3
2
  • Whoops, yes. I am not very numerate, sorry. Commented Nov 9, 2019 at 20:29
  • Changed the original text so that they third became the second and so on. Didn't know one could do that. Commented Nov 10, 2019 at 5:48

3 Answers 3

2

You can use np.add.reduceat and int.bit_length.

Example:

# make example sequence
a = np.arange(100.0)

# form sums
np.add.reduceat(a,(1<<np.arange(a.size.bit_length()))-1)
# array([   0.,    3.,   18.,   84.,  360., 1488., 2997.])

This adds items 0; 1+2; 3+4+5+6; etc. which your example suggests is what you really want.

Sign up to request clarification or add additional context in comments.

2 Comments

numpy really has everything
Wow, looks extremely concise. Yes, these are the sums I need. As I said above, my dyslexia affects my use of numbers as much as text, so sorry for that.
1
from math import log, floor
a = [12.0, 3.4, 9.2, 7.7, 4.9, 3.8, 6.9]
print([sum(a[2**i-1:2**(i+1)-1]) for i in range(floor(log(len(a), 2)) + 1)])

This outputs:

[12.0, 12.6, 23.3]

The indices used for the sums are:

[(0, 1), (1, 3), (3, 7), (7, 15), (15, 31), ...]

The last sum can be shortened when the length of the list isn't exact one less than a power of 2.

1 Comment

Thank you. Looks pretty concise.
1
seq = [12.0, 3.4, 9.2, 7.7, 4.9, 3.8, 6.9]

r = 0       # this is the binary power (2^r)
g_seq = []  # list to contain your final answer
i = 0       # index of start position for summing at each iteration  

while i < len(seq):   # once i reaches the last element, stop looping
    n = 2**r          # number of elements to sum in each iteration
    j = i + n         # index of end position for summing
    r+= 1             # increment power for next iteration
    subseq = seq[i:j] # slice of your input sequence to be summed
    g_seq.append(sum(subseq))   # sum the slice and append to answer list
    i = j             # j becomes the i for the next iteration

print(g_seq)
--------------------
[12.0, 12.6, 23.300000000000004]

Edit: added comments

4 Comments

Will have to time all three now and see which is fastest.
Paul's numpy solution will almost certainly be the fastest and most memory efficient too, since numpy is optimised for all of that.
Ah, right. As I said, that will be important in this case. Thanks again.
Testing with a = np.arange(2.0**24-1.0) clearly shows the numpy solution is fastest (the last of the sums should be 105553103683584)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.