Vectorize simple for loop in numpy

Question

I'm pretty new to numpy and I'm trying to vectorize a simple for loop for performance reasons, but I can't seem to come up with a solution. I have a numpy array with unique words and for each of these words i need the number of times they occur in another numpy array, called array_to_compare. The number is passed to a third numpy array, which has the same shape as the unique words array. Here is the code which contains the for loop:

import numpy as np

unique_words = np.array(['a', 'b', 'c', 'd'])
array_to_compare = np.array(['a', 'b', 'a', 'd'])
vector_array = np.zeros(len(unique_words))

for word in np.nditer(unique_words):
    counter = np.count_nonzero(array_to_compare == word)
    vector_array[np.where(unique_words == word)] = counter

vector_array = [2. 1. 0. 1.]    #the desired output

I tried it with np.where and np.isin, but did not get the desired result. I am thankful for any help!

While the proposed duplicate does suggest using Counter or unique, it doesn't return the result in the desired array form. The answers provided here do that. I'm reopening it. stackoverflow.com/questions/49630204/… — hpaulj
– hpaulj, Commented Jun 28, 2021 at 16:15

Daniel Lenz · Accepted Answer · 2021-06-29 06:38:51Z

2

I'd probably use a Counter and a list comprehension to solve this:

In [1]: import numpy as np
   ...:
   ...: unique_words = np.array(['a', 'b', 'c', 'd'])
   ...: array_to_compare = np.array(['a', 'b', 'a', 'd'])

In [2]: from collections import Counter

In [3]: counter = Counter(array_to_compare)

In [4]: counter
Out[4]: Counter({'a': 2, 'b': 1, 'd': 1})

In [5]: vector_array = np.array([counter[key] for key in unique_words])

In [6]: vector_array
Out[6]: array([2, 1, 0, 1])

Assembling the Counter is done in linear time and iterating through your unique_words is also linear.

edited Jun 29, 2021 at 6:38

answered Jun 28, 2021 at 14:53

Daniel Lenz

3,9371 gold badge22 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hpaulj · Accepted Answer · 2021-06-28 16:18:24Z

A numpy comparison of array values using broadcasting:

In [76]: unique_words[:,None]==array_to_compare
Out[76]: 
array([[ True, False,  True, False],
       [False,  True, False, False],
       [False, False, False, False],
       [False, False, False,  True]])
In [77]: (unique_words[:,None]==array_to_compare).sum(1)
Out[77]: array([2, 1, 0, 1])

In [78]: timeit (unique_words[:,None]==array_to_compare).sum(1)
9.5 µs ± 2.79 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

But Counter is also a good choice:

In [72]: %%timeit
    ...: c=Counter(array_to_compare)
    ...: [c[key] for key in unique_words]
12.7 µs ± 30.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Your use of count_nonzero can be improved with

In [73]: %%timeit
    ...: words=unique_words.tolist()
    ...: vector_array = np.zeros(len(words))
    ...: for i,word in enumerate(words):
    ...:     counter = np.count_nonzero(array_to_compare == word)
    ...:     vector_array[i] = counter
    ...: 
23.4 µs ± 505 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Iteration on lists is faster than on arrays (nditer doesn't add much). And enumerate lets us skip the where test.

Kraigolas · Accepted Answer · 2021-06-28 14:54:50Z

1

Similar to @DanielLenz's answer, but using np.unique to create a dict:

import numpy as np
unique_words = np.array(['a', 'b', 'c', 'd'])
array_to_compare = np.array(['a', 'b', 'a', 'd'])
counts = dict(zip(*np.unique(array_to_compare, return_counts=True)))
result = np.array([counts[word] if word in counts else 0 for word in unique_words])
[2 1 0 1]

answered Jun 28, 2021 at 14:54

Kraigolas

5,6403 gold badges15 silver badges40 bronze badges

1 Comment

trypython Over a year ago

Thank you, your solution is a lot faster than my for-loop and slightly faster than the solution from Daniel Lenz

Collectives™ on Stack Overflow

Vectorize simple for loop in numpy

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related