Numpy - find most common item per row

Question

I have a matrix like this in NumPy:

array([[0, 0, 1, 1],
       [1, 1, 0, 2],
       [0, 0, 1, 0],
       [0, 2, 1, 1],
       [1, 1, 1, 0],
       [1, 0, 2, 2]])

I'd like to get the most common value per row. In other words, I'd like to get a vector like this:

array([0, 1, 0, 1, 1, 2])

I managed to solve this problem using Scipy's mode method, in the following way:

scipy.stats.mode(data, axis=1)[0].flatten()

However, I'm looking for a solution which uses NumPy only. Moreover, the solution needs to work with negative integer values as well

Borja_042 · Accepted Answer · 2020-11-25 13:26:53Z

1

Supposing m is the name of your matrix:

most_f = np.array([np.bincount(row).argmax() for row in m])

I hope this solves your question

answered Nov 25, 2020 at 13:26

Borja_042

1,0711 gold badge16 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

David Lasry Over a year ago

Thanks, that worked :) Is there a vectorized way, though?

dspr Over a year ago

However it doesn't support negative numbers

tbrugere · Accepted Answer · 2022-04-10 00:03:33Z

If your labels are from 0 to n_labels - 1, you can use

labels_onehot = m[..., None] == np.arange(n_labels)[None, None, :] #(n_rows, n_cols, n_labels) one-hot encoded
labels_count = np.count_nonzero(labels_onehot,axis=1)              #(n_rows, n_labels), contains the number of occurence of each label in a row
most_frequent = np.argmax(labels_onehot, axis=-1)                  #(n_rows,) contains the most frequent label

Which is fully vectorized (no list comprehension, no apply_along_axis), so more efficient than the solutions proposed above in terms of speed (and kind of simpler too).

If your labels are not from 0 to n_labels - 1, you can replace np.arange(n_labels) above by an array indexing your labels to get the same result.

Self Dot · Accepted Answer · 2022-02-26 13:34:40Z

I've adapted Def_Os answer from the following post:

Most efficient way to find mode in numpy array

The following function uses numpy only, and works with negatives.

import numpy as np
def mode_row(ar):
    _min = np.min(ar)
    adjusted = False
    if _min < 0:
        ar = ar - _min
        adjusted = True
    ans = np.apply_along_axis(lambda x: np.bincount(x).argmax(), axis=1, arr=ar)
    if adjusted:
        ans = ans + _min
    return ans

A = np.array([[0, 0, 1, 1],
              [1, 1, 0, 2],
              [0, 0, 1, 0],
              [0, 2, 1, 1],
              [1, 1, 1, 0],
              [1, 0, 2, 2]])

B = A - 1

mode_row(A)
mode_row(B)

array([0, 1, 0, 1, 1, 2], dtype=int64)

array([-1, 0, -1, 0, 0, 1], dtype=int64)

Collectives™ on Stack Overflow

Numpy - find most common item per row

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related