I have a (61000L, 2L) numpy.ndarray that is made up of strings. As in, the items inside the numpy.ndarray are strings.
I split the string, so that it outputs each word in a string as a list, within the numpy.ndarray, with the following code:
words_data = np.char.split(string_data)
I tried to make a double for-loop that counts the unique words found within each list.
from collections import Counter
counts = Counter()
for i in range(words_data.shape[0]):
for j in range(words_data[1]):
counts.update(words_data[i])
counts
The output error for the code above is the following:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-39-680a0105eebd> in <module>()
1 counts = Counter()
2 for i in range(words_data.shape[0]):
----> 3 for j in range(words_data[1]):
4 counts.update(words_data[i])
5
TypeError: only size-1 arrays can be converted to Python scalar
Here is a the first 8 rows of my data:
x = np.array([["hello my name is nick", "hello my name is Nick", "hello my name is Carly", "hello my name is Ashley, "hello my name is Java", "hello my name is C++", "hello my name is Ruby", "hello my name is Python"" ],["hello my name is Java", "hello my name is C++", "hello my name is Ruby", "hello my name is Python", "hello my name is nick", "hello my name is Nick", "hello my name is Carly", "hello my name is Ashley]])
x = x.transpose()