5

So lets say i have a numpy array that holds points in 2d space, like the following

np.array([[3, 2], [4, 4], [5, 4], [4, 2], [4, 6], [9, 5]]) 

I also have a numpy array that labels each point to a number, this array is a 1d array with the length as the number of points in the point array.

np.array([0, 1, 1, 0, 2, 1])

Now i want to take the mean value of each point that have an index from the labels array. So for all points that have label 0, take the mean value of those points. My current way of solving this is the following way

return np.array([points[labels==k].mean(axis=0) for k in range(k)])

where k is the largest number in the labels array, or as it's called the number of ways to label the points.

I would like a way to do this without using a for loop, maybe some numpy functionality i haven't discovered yet?

1 Answer 1

4

Approach #1 : We can leverage matrix-multiplication with some help from braodcasting -

mask = labels == np.arange(labels.max()+1)[:,None]
out = mask.dot(points)/np.bincount(labels).astype(float)[:,None]

Sample run -

In [36]: points = np.array([[3, 2], [4, 4], [5, 4], [4, 2], [4, 6], [9, 5]]) 
    ...: labels = np.array([0, 1, 1, 0, 2, 1])

# Original soln
In [37]: L = labels.max()+1

In [38]: np.array([points[labels==k].mean(axis=0) for k in range(L)])
Out[38]: 
array([[3.5       , 2.        ],
       [6.        , 4.33333333],
       [4.        , 6.        ]])

# Proposed soln
In [39]: mask = labels == np.arange(labels.max()+1)[:,None]
    ...: out = mask.dot(points)/np.bincount(labels).astype(float)[:,None]

In [40]: out
Out[40]: 
array([[3.5       , 2.        ],
       [6.        , 4.33333333],
       [4.        , 6.        ]])

Approach #2 : With np.add.at -

sums = np.zeros((labels.max()+1,points.shape[1]),dtype=float)
np.add.at(sums,labels,points)
out = sums/np.bincount(labels).astype(float)[:,None]

Approach #3 : If all numbers from the sequence in 0 to max-label are present in labels, we can also use np.add.reduceat -

sidx = labels.argsort()
sorted_points = points[sidx]
sums = np.add.reduceat(sorted_points,np.r_[0,np.bincount(labels)[:-1].cumsum()])
out = sums/np.bincount(labels).astype(float)[:,None]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.