2

I'm pretty new to numpy and I'm trying to vectorize a simple for loop for performance reasons, but I can't seem to come up with a solution. I have a numpy array with unique words and for each of these words i need the number of times they occur in another numpy array, called array_to_compare. The number is passed to a third numpy array, which has the same shape as the unique words array. Here is the code which contains the for loop:

import numpy as np

unique_words = np.array(['a', 'b', 'c', 'd'])
array_to_compare = np.array(['a', 'b', 'a', 'd'])
vector_array = np.zeros(len(unique_words))

for word in np.nditer(unique_words):
    counter = np.count_nonzero(array_to_compare == word)
    vector_array[np.where(unique_words == word)] = counter

vector_array = [2. 1. 0. 1.]    #the desired output

I tried it with np.where and np.isin, but did not get the desired result. I am thankful for any help!

1
  • While the proposed duplicate does suggest using Counter or unique, it doesn't return the result in the desired array form. The answers provided here do that. I'm reopening it. stackoverflow.com/questions/49630204/… Commented Jun 28, 2021 at 16:15

3 Answers 3

2

I'd probably use a Counter and a list comprehension to solve this:

In [1]: import numpy as np
   ...:
   ...: unique_words = np.array(['a', 'b', 'c', 'd'])
   ...: array_to_compare = np.array(['a', 'b', 'a', 'd'])

In [2]: from collections import Counter

In [3]: counter = Counter(array_to_compare)

In [4]: counter
Out[4]: Counter({'a': 2, 'b': 1, 'd': 1})

In [5]: vector_array = np.array([counter[key] for key in unique_words])

In [6]: vector_array
Out[6]: array([2, 1, 0, 1])

Assembling the Counter is done in linear time and iterating through your unique_words is also linear.

Sign up to request clarification or add additional context in comments.

Comments

1

A numpy comparison of array values using broadcasting:

In [76]: unique_words[:,None]==array_to_compare
Out[76]: 
array([[ True, False,  True, False],
       [False,  True, False, False],
       [False, False, False, False],
       [False, False, False,  True]])
In [77]: (unique_words[:,None]==array_to_compare).sum(1)
Out[77]: array([2, 1, 0, 1])

In [78]: timeit (unique_words[:,None]==array_to_compare).sum(1)
9.5 µs ± 2.79 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

But Counter is also a good choice:

In [72]: %%timeit
    ...: c=Counter(array_to_compare)
    ...: [c[key] for key in unique_words]
12.7 µs ± 30.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Your use of count_nonzero can be improved with

In [73]: %%timeit
    ...: words=unique_words.tolist()
    ...: vector_array = np.zeros(len(words))
    ...: for i,word in enumerate(words):
    ...:     counter = np.count_nonzero(array_to_compare == word)
    ...:     vector_array[i] = counter
    ...: 
23.4 µs ± 505 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Iteration on lists is faster than on arrays (nditer doesn't add much). And enumerate lets us skip the where test.

Comments

1

Similar to @DanielLenz's answer, but using np.unique to create a dict:

import numpy as np
unique_words = np.array(['a', 'b', 'c', 'd'])
array_to_compare = np.array(['a', 'b', 'a', 'd'])
counts = dict(zip(*np.unique(array_to_compare, return_counts=True)))
result = np.array([counts[word] if word in counts else 0 for word in unique_words])
[2 1 0 1]

1 Comment

Thank you, your solution is a lot faster than my for-loop and slightly faster than the solution from Daniel Lenz

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.