TypeError: only size-1 arrays can be converted to Python scalars

Question

I have a (61000L, 2L) numpy.ndarray that is made up of strings. As in, the items inside the numpy.ndarray are strings.

I split the string, so that it outputs each word in a string as a list, within the numpy.ndarray, with the following code:

words_data = np.char.split(string_data)

I tried to make a double for-loop that counts the unique words found within each list.

from collections import Counter
counts = Counter()
for i in range(words_data.shape[0]):
    for j in range(words_data[1]):
        counts.update(words_data[i])

counts

The output error for the code above is the following:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-680a0105eebd> in <module>()
      1 counts = Counter()
      2 for i in range(words_data.shape[0]):
----> 3     for j in range(words_data[1]):
      4         counts.update(words_data[i])
      5 

TypeError: only size-1 arrays can be converted to Python scalar

Here is a the first 8 rows of my data:

 x = np.array([["hello my name is nick", "hello my name is Nick", "hello my name is Carly", "hello my name is Ashley, "hello my name is Java", "hello my name is C++", "hello my name is Ruby", "hello my name is Python"" ],["hello my name is Java", "hello my name is C++", "hello my name is Ruby", "hello my name is Python", "hello my name is nick", "hello my name is Nick", "hello my name is Carly", "hello my name is Ashley]])

 x =  x.transpose()

jpp · Accepted Answer · 2018-02-12 16:00:31Z

2

No loops required here. Here is one solution:

from collections import Counter
from itertools import chain
import numpy as np

string_data = np.array([["hello my name is nick", "hello my name is Nick", "hello my name is Carly",
                         "hello my name is Ashley", "hello my name is Java", "hello my name is C++",
                         "hello my name is Ruby", "hello my name is Python"],
                         ["hello my name is Java", "hello my name is C++", "hello my name is Ruby",
                          "hello my name is Python", "hello my name is nick", "hello my name is Nick",
                          "hello my name is Carly", "hello my name is Ashley"]])

word_count = Counter(' '.join(chain.from_iterable(string_data)).split())

# Counter({'Ashley': 2,
#          'C++': 2,
#          'Carly': 2,
#          'Java': 2,
#          'Nick': 2,
#          'Python': 2,
#          'Ruby': 2,
#          'hello': 16,
#          'is': 16,
#          'my': 16,
#          'name': 16,
#          'nick': 2})

edited Feb 12, 2018 at 16:00

answered Feb 12, 2018 at 14:50

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Jayganesh Kalla Over a year ago

When I run the code above, it gives me a TypeError: unhashable type: 'list' @jp_data_analysis

jpp Over a year ago

@JayganeshKalla, works for me on python 3.6 / numpy 1.11. Are you testing exactly what I've posted?

Jayganesh Kalla Over a year ago

I tried your code on a separate jupyter notebook and it worked, but when I tried to use my data it outputs the error I mentioned. I am using Python 2.7.14/ numpy 1.14.0, @jp_data_analysis

jpp Over a year ago

in that case, you have to show a sample of your data so we have a reproducible example.

Jayganesh Kalla Over a year ago

i have added the first 8 rows

|

Collectives™ on Stack Overflow

TypeError: only size-1 arrays can be converted to Python scalars

1 Answer 1

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related