Normalizing a numpy array

Question

Given an array, I want to normalize it such that each row sums to 1.

I currently have the following code:

import numpy
w = numpy.array([[0, 1, 0, 1, 0, 0], 
                 [1, 0, 0, 0, 0, 1], 
                 [0, 0, 0, 0, 0, 1], 
                 [1, 0, 0, 0, 1, 0], 
                 [0, 0, 0, 1, 0, 1], 
                 [0, 1, 1, 0, 1, 0]], dtype = float)


def rownormalize(array):
    i = 0
    for row in array:
        array[i,:] = array[i,:]/sum(row)
        i += 1

I've two questions:

1) The code works, but I'm wondering if there's a more elegant way.

2) How can I convert the data type into a float if it's int? I tried

if array.dtype == int:
    array.dtype = float

But it doesn't work.

Julien · Accepted Answer · 2016-11-13 21:14:06Z

7

You can do 1) like that:

array /= array.sum(axis=1, keepdims=True)

and 2) like that:

array = array.astype(float)

edited Nov 13, 2016 at 21:14

answered Nov 13, 2016 at 21:05

Julien

15.3k6 gold badges33 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

wwl Over a year ago

If I add " if array.dtype == int: array.astype(float)" to the start of my function, it gives me a matrix of zeros (except one element which is 1)

wwl Over a year ago

That works if I add return array and change my code to w = rownormalize(w) instead of rownormalize(w). Is there any way I can do it without making the above changes? If you're not sure, its fine

Julien Over a year ago

If you had only 1) it would work, but with type conversion I think you cannot. So halfway solution could be to do 1) inplace, and just do 2) before calling 1)

Divakar Over a year ago

@wwl Julien Bernu's post+comments answered both of your queries, so I think it would be fair to accept his post, if it answered both of those sufficiently.

Divakar · Accepted Answer · 2016-11-13 21:19:16Z

4

Divisions even though broadcasted across all elements could be expensive. An alternative with focus on performance, would be to pre-compute the reciprocal of row-summations and use those to perform broadcasted multiplications instead, like so -

w *= 1.0/w.sum(1,keepdims=1)

Runtime test -

In [588]: w = np.random.rand(3000,3000)

In [589]: out1 = w/w.sum(axis=1, keepdims=True) #@Julien Bernu's soln

In [590]: out2 = w*(1.0/w.sum(1,keepdims=1))

In [591]: np.allclose(out1,out2)
Out[591]: True

In [592]: %timeit w/w.sum(axis=1, keepdims=True) #@Julien Bernu's soln
10 loops, best of 3: 66.7 ms per loop

In [593]: %timeit w*(1.0/w.sum(1,keepdims=1))
10 loops, best of 3: 40 ms per loop

answered Nov 13, 2016 at 21:19

Divakar

222k19 gold badges273 silver badges374 bronze badges

3 Comments

Julien Over a year ago

The difference is not huge but since it is so easily done I will try to remember this trick!

Divakar Over a year ago

@JulienBernu Yup! Specially with broadcasting, this comes in handy!

Julien Over a year ago

BTW, why is it any faster? There is a division in both cases and yours also does a multiplication... I assume 1/x must be much faster than y/x but why is that, and if that's the case then why isn't y/x automatically computed as y*(1/x) all the time?

Collectives™ on Stack Overflow

Normalizing a numpy array

2 Answers 2

4 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related