4

I have a matrix like this in NumPy:

array([[0, 0, 1, 1],
       [1, 1, 0, 2],
       [0, 0, 1, 0],
       [0, 2, 1, 1],
       [1, 1, 1, 0],
       [1, 0, 2, 2]])

I'd like to get the most common value per row. In other words, I'd like to get a vector like this:

array([0, 1, 0, 1, 1, 2])

I managed to solve this problem using Scipy's mode method, in the following way:

scipy.stats.mode(data, axis=1)[0].flatten()

However, I'm looking for a solution which uses NumPy only. Moreover, the solution needs to work with negative integer values as well

3 Answers 3

1

Supposing m is the name of your matrix:

most_f = np.array([np.bincount(row).argmax() for row in m])

I hope this solves your question

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, that worked :) Is there a vectorized way, though?
However it doesn't support negative numbers
1

If your labels are from 0 to n_labels - 1, you can use

labels_onehot = m[..., None] == np.arange(n_labels)[None, None, :] #(n_rows, n_cols, n_labels) one-hot encoded
labels_count = np.count_nonzero(labels_onehot,axis=1)              #(n_rows, n_labels), contains the number of occurence of each label in a row
most_frequent = np.argmax(labels_onehot, axis=-1)                  #(n_rows,) contains the most frequent label

Which is fully vectorized (no list comprehension, no apply_along_axis), so more efficient than the solutions proposed above in terms of speed (and kind of simpler too).

If your labels are not from 0 to n_labels - 1, you can replace np.arange(n_labels) above by an array indexing your labels to get the same result.

Comments

0

I've adapted Def_Os answer from the following post:

Most efficient way to find mode in numpy array

The following function uses numpy only, and works with negatives.

import numpy as np
def mode_row(ar):
    _min = np.min(ar)
    adjusted = False
    if _min < 0:
        ar = ar - _min
        adjusted = True
    ans = np.apply_along_axis(lambda x: np.bincount(x).argmax(), axis=1, arr=ar)
    if adjusted:
        ans = ans + _min
    return ans

A = np.array([[0, 0, 1, 1],
              [1, 1, 0, 2],
              [0, 0, 1, 0],
              [0, 2, 1, 1],
              [1, 1, 1, 0],
              [1, 0, 2, 2]])

B = A - 1

mode_row(A)
mode_row(B)

array([0, 1, 0, 1, 1, 2], dtype=int64)

array([-1, 0, -1, 0, 0, 1], dtype=int64)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.