Numpy split array based on condition without for loop

Question

So lets say i have a numpy array that holds points in 2d space, like the following

np.array([[3, 2], [4, 4], [5, 4], [4, 2], [4, 6], [9, 5]])

I also have a numpy array that labels each point to a number, this array is a 1d array with the length as the number of points in the point array.

np.array([0, 1, 1, 0, 2, 1])

Now i want to take the mean value of each point that have an index from the labels array. So for all points that have label 0, take the mean value of those points. My current way of solving this is the following way

return np.array([points[labels==k].mean(axis=0) for k in range(k)])

where k is the largest number in the labels array, or as it's called the number of ways to label the points.

I would like a way to do this without using a for loop, maybe some numpy functionality i haven't discovered yet?

Divakar · Accepted Answer · 2019-02-26 16:24:07Z

Approach #1 : We can leverage matrix-multiplication with some help from braodcasting -

mask = labels == np.arange(labels.max()+1)[:,None]
out = mask.dot(points)/np.bincount(labels).astype(float)[:,None]

Sample run -

In [36]: points = np.array([[3, 2], [4, 4], [5, 4], [4, 2], [4, 6], [9, 5]]) 
    ...: labels = np.array([0, 1, 1, 0, 2, 1])

# Original soln
In [37]: L = labels.max()+1

In [38]: np.array([points[labels==k].mean(axis=0) for k in range(L)])
Out[38]: 
array([[3.5       , 2.        ],
       [6.        , 4.33333333],
       [4.        , 6.        ]])

# Proposed soln
In [39]: mask = labels == np.arange(labels.max()+1)[:,None]
    ...: out = mask.dot(points)/np.bincount(labels).astype(float)[:,None]

In [40]: out
Out[40]: 
array([[3.5       , 2.        ],
       [6.        , 4.33333333],
       [4.        , 6.        ]])

Approach #2 : With np.add.at -

sums = np.zeros((labels.max()+1,points.shape[1]),dtype=float)
np.add.at(sums,labels,points)
out = sums/np.bincount(labels).astype(float)[:,None]

Approach #3 : If all numbers from the sequence in 0 to max-label are present in labels, we can also use np.add.reduceat -

sidx = labels.argsort()
sorted_points = points[sidx]
sums = np.add.reduceat(sorted_points,np.r_[0,np.bincount(labels)[:-1].cumsum()])
out = sums/np.bincount(labels).astype(float)[:,None]

Collectives™ on Stack Overflow

Numpy split array based on condition without for loop

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related