2

Given an array, I want to normalize it such that each row sums to 1.

I currently have the following code:

import numpy
w = numpy.array([[0, 1, 0, 1, 0, 0], 
                 [1, 0, 0, 0, 0, 1], 
                 [0, 0, 0, 0, 0, 1], 
                 [1, 0, 0, 0, 1, 0], 
                 [0, 0, 0, 1, 0, 1], 
                 [0, 1, 1, 0, 1, 0]], dtype = float)


def rownormalize(array):
    i = 0
    for row in array:
        array[i,:] = array[i,:]/sum(row)
        i += 1

I've two questions:

1) The code works, but I'm wondering if there's a more elegant way.

2) How can I convert the data type into a float if it's int? I tried

if array.dtype == int:
    array.dtype = float

But it doesn't work.

2 Answers 2

7

You can do 1) like that:

array /= array.sum(axis=1, keepdims=True)

and 2) like that:

array = array.astype(float)
Sign up to request clarification or add additional context in comments.

4 Comments

If I add " if array.dtype == int: array.astype(float)" to the start of my function, it gives me a matrix of zeros (except one element which is 1)
That works if I add return array and change my code to w = rownormalize(w) instead of rownormalize(w). Is there any way I can do it without making the above changes? If you're not sure, its fine
If you had only 1) it would work, but with type conversion I think you cannot. So halfway solution could be to do 1) inplace, and just do 2) before calling 1)
@wwl Julien Bernu's post+comments answered both of your queries, so I think it would be fair to accept his post, if it answered both of those sufficiently.
4

Divisions even though broadcasted across all elements could be expensive. An alternative with focus on performance, would be to pre-compute the reciprocal of row-summations and use those to perform broadcasted multiplications instead, like so -

w *= 1.0/w.sum(1,keepdims=1)

Runtime test -

In [588]: w = np.random.rand(3000,3000)

In [589]: out1 = w/w.sum(axis=1, keepdims=True) #@Julien Bernu's soln

In [590]: out2 = w*(1.0/w.sum(1,keepdims=1))

In [591]: np.allclose(out1,out2)
Out[591]: True

In [592]: %timeit w/w.sum(axis=1, keepdims=True) #@Julien Bernu's soln
10 loops, best of 3: 66.7 ms per loop

In [593]: %timeit w*(1.0/w.sum(1,keepdims=1))
10 loops, best of 3: 40 ms per loop

3 Comments

The difference is not huge but since it is so easily done I will try to remember this trick!
@JulienBernu Yup! Specially with broadcasting, this comes in handy!
BTW, why is it any faster? There is a division in both cases and yours also does a multiplication... I assume 1/x must be much faster than y/x but why is that, and if that's the case then why isn't y/x automatically computed as y*(1/x) all the time?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.