Numpy unique: count for values also not in array?

Question

I have an array as so:

myarray = [['a', 'b', 'c'],
           ['b', 'c', 'd'],
           ['c', 'd', 'e']]

And for this, np.unique(myarray, return_counts=True) works amazingly and gives me the desired output. However I would then like to apply it row by row, and for it to be able to tell me that in row number 1, the counts for d and e are 0.

For the moment I've been trying to add them to the array row each iteration during a for loop and then subtracting 1 to each count, but even that has me confused. I've tried these two solutions:

for i in range(mylen):
    unique, counts = np.unique(np.array([list(myarray[i]), 'a', 'b', 'c', 'd', 'e']), return_counts=True) # attempt 1
    unique, counts = np.unique(np.vstack((myarray[i], 'a', 'b', 'c', 'd', 'e')), return_counts=True) # attempt 2

But neither works. Does anyone have an elegant solution? This will be used for thousands, perhaps millions, of values, so computation time is somewhat relevant to the discussion.

Mad Physicist · Accepted Answer · 2021-07-21 16:55:54Z

2

You can use np.unique with return_inverse=True to get what you want:

letters, inv = np.unique(myarray, return_inverse=True)
inv = inv.reshape(myarray.shape)

inv is now

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]], dtype=int64)

You can get counts of all the unique elements in one line:

>>> (inv == np.arange(len(letters)).reshape(-1, 1, 1)).sum(-1)
array([[1, 0, 0],
       [1, 1, 0],
       [1, 1, 1],
       [0, 1, 1],
       [0, 0, 1]])

The first dimension corresponds to the letter in letters, the second to the row number, since sum(-1) sums across the columns. You can get counts for the columns using sum(1) instead. In your symmetrical example, the result will be identical.

No looping, no np.apply_along_axis (which is a glorified loop), all vectorized. Here is a quick timing test:

np.random.seed(42)
myarray = np.random.choice(list(string.ascii_lowercase), size=(100, 100))

def Epsi95(arr):
    uniques = np.unique(arr)
    def fun(x):
        base_dict = dict(zip(uniques, [0]*uniques.shape[0]))
        base_dict.update(dict(zip(*np.unique(x, return_counts=True))))
        return [i[-1] for i in sorted(base_dict.items())]
    return np.apply_along_axis(fun, 1, arr)

def MadPhysicist(myarray):
    letters, inv = np.unique(myarray, return_inverse=True)
    inv = inv.reshape(myarray.shape)
    return (inv == np.arange(len(letters)).reshape(-1, 1, 1)).sum(-1)    

%timeit Epsi95(myarray)
6.37 ms ± 26.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit MadPhysicist(myarray)
1.28 ms ± 6.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited Jul 21, 2021 at 16:55

answered Jul 21, 2021 at 16:46

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Epsi95 Over a year ago

Ya, this solution is much more superior interms of efficiency.

Whitehot Over a year ago

This looks fantastic, thanks. Just so that I'm clear on a couple of points: 1) the array you show after (inv == np.arange(len(letters)).reshape(-1, 1, 1)).sum(-1) is for each possibility in my universe, the number of time it appears in each row, right? So [1,0,0] means it appears once in the first row, then not at all in rows 2 and 3. 2) If I wanted to apply this to columns instead of rows, how would that work? Would it be easiest to just transpose myarray and apply the same function?

Mad Physicist Over a year ago

You can play with the dimensions yes. And yes to the interpretation of the data.

Epsi95 · Accepted Answer · 2021-07-21 16:53:34Z

1

myarray = [['a', 'b', 'c'],
           ['b', 'c', 'd'],
           ['c', 'd', 'e']]

arr = np.array(myarray)

uniques = np.unique(arr)

def fun(x):
    base_dict = dict(zip(uniques, [0]*uniques.shape[0]))
    base_dict.update(dict(zip(*np.unique(x, return_counts=True))))
    return [i[-1] for i in sorted(base_dict.items())]

np.apply_along_axis(fun, 1, arr)

# array([[1, 1, 1, 0, 0], # a=1 b=1 c=1 d=0 e=0
#        [0, 1, 1, 1, 0],
#        [0, 0, 1, 1, 1]], dtype=int64)

edited Jul 21, 2021 at 16:53

answered Jul 21, 2021 at 16:09

Epsi95

9,1071 gold badge19 silver badges37 bronze badges

3 Comments

Mad Physicist Over a year ago

apply_along_axis is just a glorified for loop, no matter what the docs may tell you.

Mad Physicist Over a year ago

You never need to sort the output of unique

Epsi95 Over a year ago

you are correct (both the cases), in second case actually I was doing return list(base_dict.values()) since dict 3.6+ keeps order, but later thought to generalize and forgot to remove the first sort.

Abstract · Accepted Answer · 2021-07-21 16:13:39Z

You can iterate over the rows of the list and then by the unique values of the entire set. Giving an example below, and this can be used to insert the elements into a dictionary or any other structure of your choosing.

Example:

import numpy as np

myarray = [['a', 'b', 'c'],
           ['b', 'c', 'd'],
           ['c', 'd', 'e']]

uniq = np.unique(np.array(myarray))

for idx, row in enumerate(myarray):
    for x in uniq:
        print(f"Row {idx} Element ({x}) Count: {row.count(x)}")

Output:

Row 0 Element (a) Count: 1
Row 0 Element (b) Count: 1
Row 0 Element (c) Count: 1
Row 0 Element (d) Count: 0
Row 0 Element (e) Count: 0
Row 1 Element (a) Count: 0
Row 1 Element (b) Count: 1
Row 1 Element (c) Count: 1
Row 1 Element (d) Count: 1
Row 1 Element (e) Count: 0
Row 2 Element (a) Count: 0
Row 2 Element (b) Count: 0
Row 2 Element (c) Count: 1
Row 2 Element (d) Count: 1
Row 2 Element (e) Count: 1

To use a list of dictionaries for each row:

import numpy as np

myarray = [['a', 'b', 'c'],
           ['b', 'c', 'd'],
           ['c', 'd', 'e']]

uniq = np.unique(np.array(myarray))
row_vals = []

for idx, row in enumerate(myarray):
    dict = {}
    for x in uniq:
        dict[x] = row.count(x)
    row_vals.append(dict)

for r in row_vals:
    print(r)

Output:

{'a': 1, 'b': 1, 'c': 1, 'd': 0, 'e': 0}
{'a': 0, 'b': 1, 'c': 1, 'd': 1, 'e': 0}
{'a': 0, 'b': 0, 'c': 1, 'd': 1, 'e': 1}

Collectives™ on Stack Overflow

Numpy unique: count for values also not in array?

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related