Getting an error with Numpy where condition

Question

I am trying create a new column with using np.where condition of other columns in the database.

My code

  df5['RiskSubType']=np.where(new_df['Snow_Risk']==1,(( ' Heavy Snow forecasted at  ' +df5.LOCATION.mask(new_df.LOCATION=='',df5.LOCATION_CITY))),
np.where(df5['Wind_Risk']==1,( ' Heavy Wind forecasted at  ' +df5.LOCATION.mask(df5.LOCATION=='',df5.LOCATION_CITY)),
np.where(df5['Precip_Risk']==1,( ' Heavy Rain forecasted at  ' +df5.LOCATION.mask(df5.LOCATION=='',df5.LOCATION_CITY)),"No Risk Identified")))

Error

ValueError: operands could not be broadcast together with shapes

How to fix this or this alternative way do this.

SO if I break it down into simpler syntax your code is: df5['RST'] = np.where(new_df['SR'] == 1, 'a', np.where(df5['WR']==1, 'b', np.where(df['PR']==1, 'c', 'd'))) right? just to make it a bit more readable. — LeoE
– LeoE, Commented Nov 28, 2019 at 3:14
What have you tried? Have you done any research? What do you understand from that error message? Also, is the error message cut off? It looks like it should give the shapes involved. I will say nothing of the seemingly awful design/code style. — AMC
– AMC, Commented Nov 28, 2019 at 5:20

LeoE · Accepted Answer · 2019-11-28 11:30:39Z

So first of all, your design/code style is really hard to read, you should think about simplifying it. Your problems occurs due to the fact, that you are trying to smash strings and arrays in the np.where function. The documentation says:

numpy.where(condition[, x, y])

Return elements chosen from x or y depending on condition.

Parameters:

condition : array_like, bool

Where True, yield x, otherwise yield y.
x, y : array_like
Values from which to choose. x, y and condition need to be broadcastable to some shape.

Returns:

out : ndarray

An array with elements from x where condition is True, and elements from y elsewhere.

As you can see x and y need to be broadcastable to some shape. Looking at the documentation of broadcastable:

6.4. Broadcasting

Another powerful feature of Numpy is broadcasting. Broadcasting takes place when you perform operations between arrays of different shapes. For instance
>>> a = np.array([
    [0, 1],
    [2, 3],
    [4, 5],
    ])
>>> b = np.array([10, 100])
>>> a * b
array([[  0, 100],
       [ 20, 300],
       [ 40, 500]])
The shapes of a and b don’t match. In order to proceed, Numpy will stretch b into a second dimension, as if it were stacked three times upon itself. The operation then takes place element-wise.

One of the rules of broadcasting is that only dimensions of size 1 can be stretched (if an array only has one dimension, all other dimensions are considered for broadcasting purposes to have size 1). In the example above b is 1D, and has shape (2,). For broadcasting with a, which has two dimensions, Numpy adds another dimension of size 1 to b. b now has shape (1, 2). This new dimension can now be stretched three times so that b’s shape matches a’s shape of (3, 2).

The other rule is that dimensions are compared from the last to the first. Any dimensions that do not match must be stretched to become equally sized. However, according to the previous rule, only dimensions of size 1 can stretch. This means that some shapes cannot broadcast and Numpy will give you an error:
>>> c = np.array([
    [0, 1, 2],
    [3, 4, 5],
    ])
>>> b = np.array([10, 100])
>>> c * b
ValueError: operands could not be broadcast together with shapes (2,3) (2,)
What happens here is that Numpy, again, adds a dimension to b, making it of shape (1, 2). The sizes of the last dimensions of b and c (2 and 3, respectively) are then compared and found to differ. Since none of these dimensions is of size 1 (therefore, unstretchable) Numpy gives up and produces an error.

The solution to multiplying c and b above is to specifically tell Numpy that it must add that extra dimension as the second dimension of b. This is done by using None to index that second dimension. The shape of b then becomes (2, 1), which is compatible for broadcasting with c:
>>> c = np.array([
    [0, 1, 2],
    [3, 4, 5],
    ])
>>> b = np.array([10, 100])
>>> c * b[:, None]
array([[  0,  10,  20],
       [300, 400, 500]])
A good visual description of these rules, together with some advanced broadcasting applications can be found in this tutorial of Numpy broadcasting rules.

So the problem is, that you are trying to broadcast an (n,)(first where) to a scalar(first string) to a (m,)(second where) to a scalar(second string) to a (k,)(third where) and so on. Since n != m != k can and will be the case and the dimensions for stretching do not match the broadcasting does not work.

Joe · Accepted Answer · 2019-11-28 05:25:20Z

1

Please provide something like this:

d = {'LOCATION': ['?', '?'],
     'LOCATION_CITY': ['?', '?'],
     'Wind_Risk': [1, 0],
     'Precip_Risk': [1, 0],
     'Snow_Risk': [1, 0]}

df = pd.DataFrame(data=d)

answered Nov 28, 2019 at 5:25

Joe

7,2433 gold badges31 silver badges59 bronze badges

Collectives™ on Stack Overflow

Getting an error with Numpy where condition

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related