0

I'm managing a big set of positions at different times, as a sparse matrix: an array of positions (the columns) and an array of times with the same size. E.g.

matrix = numpy.random.randint(2, size = 100).astype(float).reshape(10,10)
times = numpy.nonzero(matrix)[0]+1
positions = numpy.nonzero(matrix)[1]

Now i have to correct the positions with the speeds associated to a time. The problem is that being a sparse matrix, i have to expand the speed associated to a time, to every position at a given time (i.e. to every non-zero element in a given row). I know the indexes of first pisition at agiven time and the number of times nTimes How can i vectorize this code (i.e. remove the loop)?

indexes = numpy.where(numpy.diff(times)>0)[0]+1
indexes = numpy.concatenate(([0],indexes, [times.size]))
nTimes = numpy.size(indexes)-1
speeds = numpy.random.rand(nTimes)

starts = indexes[:-1]
ends = indexes[1:]
expandedSpeeds = numpy.zeros(positions.size)
for i in numpy.arange(0,nTimes):
    expandedSpeeds[starts[i]:ends[i]] = speeds[i]

Edited in order to give a runnable example.

3
  • I find multiple things confusing in your example. What is indexes? If expandedSpeeds has a first dimension of size 3, that indexing in the last line is a bit weird. On the right hand side of the last line you basically have speeds[i] (except it's inside an array). Could you put a runnable MCVE into your question, please? Commented Oct 2, 2017 at 18:11
  • 1
    Yes you're right, I'm so sorry, I made a mess copying the code and trying to simplify the problem. The indexes array is numpy.where(numpy.diff(times)>0). I'm gonna write a runnable example. Commented Oct 2, 2017 at 18:29
  • 1
    @AndrasDeak now you can try the code, it should work properly, thank you and sorry again. Commented Oct 3, 2017 at 11:03

1 Answer 1

1

I tried to functionalize the original approach and here's what I got -

def slices2arr_org(ar, starts, ends, N):
    out0 = np.zeros((N),dtype=ar.dtype)
    for i in np.arange(n_grp):
        out0[starts[i]:ends[i]] = ar[i:i+1]
    return out0

Now to vectorize it, we could make use of cumulative summation and some masking, like so -

def slices2arr_vect(ar, starts, ends, N):
    id_arr = np.zeros((N),dtype=int)
    id_arr[starts[1:]] = 1
    c = id_arr.cumsum()

    np.add.at(id_arr, ends[1:],-1)
    out = np.where(id_arr.cumsum()==0, 0, ar[c])
    out[starts[0]:ends[0]] = ar[0]
    return out   

Here's a sample run to make things clearer -

In [677]: # Setup inputs
     ...: np.random.seed(0)    
     ...: n_grp = 5
     ...: N = 15
     ...: idx = np.sort(np.random.choice(N, n_grp*2, replace=0))
     ...: starts, ends = idx[::2], idx[1::2]                      
     ...: ar = np.random.randint(11,99,(N))
     ...: 

In [678]: ar
Out[678]: array([76, 50, 98, 57, 92, 48, 36, 88, 83, 20, 31, 91, 80, 90, 58])

In [679]: starts
Out[679]: array([ 1,  4,  7,  9, 13])

In [680]: ends
Out[680]: array([ 2,  6,  8, 10, 14])

In [681]: slices2arr_org(ar, starts, ends, N)
Out[681]: array([ 0, 76,  0,  0, 50, 50,  0, 98,  0, 57,  0,  0,  0, 92,  0])

In [682]: slices2arr_vect(ar, starts, ends, N)
Out[682]: array([ 0, 76,  0,  0, 50, 50,  0, 98,  0, 57,  0,  0,  0, 92,  0])
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much, i could never figure to use cumsum to solve the problem. If you want a little hint, np.ufunc.at unfortunately is even slower than a python loop (!). Your function works well too with a simple gathering with np.take or fancy indexing (how numpy docs calls it xD): id_arr[ends[1:]] =id_arr[ends[1:]]-1. In this way che code runs twice faster ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.