1

I have a numpy array which has only a few non-zero entries which can be either positive or negative. E.g. something like this:

myArray = np.array([[ 0.        ,  0.        ,  0.        ],
       [ 0.32, -6.79,  0.        ],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  1.5        ,  0.        ],
       [ 0.        ,  0.        , -1.71]])

In the end, I would like to receive a list where each entry of this list corresponds to a row of myArray and is a cumulative product of function outputs which depend on the entries of the respective row of myArray and another list (in the example below it is called l). The individual terms depend on the sign of the myArray entry: When it is positive, I apply "funPos", when it is negative, I apply "funNeg" and if the entry is 0, the term will be 1. So in the example array from above it would be:

output = [1*1*1 , 
         funPos(0.32, l[0])*funNeg(-6.79,l[1])*1, 
         1*1*1, 
         1*funPos(1.5, l[1])*1, 
         1*1*funNeg(-1.71, l[2])]

I implemented this as shown below and it gives me the desired output (note: that is just a highly simplified toy example; the actual matrices are far bigger and the functions more complicated). I go through each row of the array, if the sum of the row is 0, I don't have to do any calculations and the output is just 1. If it is not equal 0, I go through this row, check the sign of each value and apply the appropriate function.

import numpy as np
def doCalcOnArray(Array1, myList):

    output = np.ones(Array1.shape[0]) #initialize output

    for indRow,row in enumerate(Array1):

    if sum(row) != 0: #only then calculations are needed
        tempProd = 1. #initialize the product that corresponds to the row
        for indCol, valCol in enumerate(row):

        if valCol > 0:
            tempVal = funPos(valCol, myList[indCol])

        elif valCol < 0:
            tempVal = funNeg(valCol, myList[indCol])

        elif valCol == 0:
            tempVal = 1

        tempProd = tempProd*tempVal

        output[indRow] = tempProd

    return output 

def funPos(val1,val2):
    return val1*val2

def funNeg(val1,val2):
    return val1*(val2+1)

myArray = np.array([[ 0.        ,  0.        ,  0.        ],
       [ 0.32, -6.79,  0.        ],
       [ 0.        ,  0.        ,  0.        ],
       [ 0.        ,  1.5        ,  0.        ],
       [ 0.        ,  0.        , -1.71]])     

l = [1.1, 2., 3.4]

op = doCalcOnArray(myArray,l)
print op

The output is

[ 1.      -7.17024  1.       3.      -7.524  ]

which is the desired one.
My question is whether there is a more efficient way for doing that since that is quite "expensive" for large arrays.

EDIT: I accepted gabhijit's answer because the pure numpy solution he came up with seems to be the fastest one for the arrays I am dealing with. Please note, that there is also a nice working solution from RaJa that requires panda and also the solution from dave works fine which can serve as a nice example on how to use generators and numpy's "apply_along_axis".

4 Answers 4

1

Here's what I have tried - using reduce, map. I am not sure how fast this is - but is this what you are trying to do?

Edit 4: Simplest and most readable - Make l a numpy array and then greatly simplifies where.

import numpy as np
import time

l = np.array([1.0, 2.0, 3.0])

def posFunc(x,y):
    return x*y

def negFunc(x,y):
    return x*(y+1)

def myFunc(x, y):
    if x > 0:
        return posFunc(x, y)
    if x < 0:
        return negFunc(x, y)
    else:
        return 1.0

myArray = np.array([
        [ 0.,0.,0.],
        [ 0.32, -6.79,  0.],
        [ 0.,0.,0.],
        [ 0.,1.5,0.],
        [ 0.,0., -1.71]])

t1 = time.time()
a = np.array([reduce(lambda x, (y,z): x*myFunc(z,l[y]), enumerate(x), 1) for x in myArray])
t2 = time.time()
print (t2-t1)*1000000
print a

Basically let's just look at last line it says cumulatively multiply things in enumerate(xx), starting with 1 (last parameter to reduce). myFunc simply takes the element in myArray(row) and element @ index row in l and multiplies them as needed.

My output is not same as yours - so I am not sure whether this is exactly what you want, but may be you can follow the logic.

Also I am not so sure how fast this will be for huge arrays.

edit: Following is a 'pure numpy way' to do this.

my = myArray # just for brevity

t1 = time.time() 
# First set the positive and negative values
# complicated - [my.itemset((x,y), posFunc(my.item(x,y), l[y])) for (x,y) in zip(*np.where(my > 0))]
# changed to 
my = np.where(my > 0, my*l, my)
# complicated - [my.itemset((x,y), negFunc(my.item(x,y), l[y])) for (x,y) in zip(*np.where(my < 0))]
# changed to 
my = np.where(my < 0, my*(l+1), my)
# print my - commented out to time it.

# Now set the zeroes to 1.0s
my = np.where(my == 0.0, 1.0, my)
# print my  - commented out to time it

a = np.prod(my, axis=1)
t2 = time.time()
print (t2-t1)*1000000

print a

Let me try to explain the zip(*np.where(my != 0)) part as best as I can. np.where simply returns two numpy arrays first array is an index of row, second array is an index of column that matches the condition (my != 0) in this case. We take a tuple of those indices and then use array.itemset and array.item, thankfully, column index is available for free to us, so we can just take the element @ that index in the list l. This should be faster than previous (and by orders of magnitude readable!!). Need to timeit to find out whether it indeed is.

Edit 2: Don't have to call separately for positive and negative can be done with one call np.where(my != 0).

Sign up to request clarification or add additional context in comments.

10 Comments

Seems to work great, thanks! Your output differs from mine because you chose a different l; in my example it is l = [1.1, 2., 3.4]. Replacing your list by this, gives the desired output. I'll upvote your solution as well. I'll try to get your last line and get back to you if I have questions :)
I tried to time - both the versions - interestingly, the 'pure numpy' version isn't very fast than the reduce one. I am not able to explain - why? Interesting. Python version 2.7.3 on Debian x86_64. Numpy version 1.6.2. Editing both the versions to add that.
In fact 'reduce' one on an average is the 'fastest' of all that I have tried so far.
The reduced version won't work on Python 3.x as unpacking of tuples in lambda has been removed. I noticed that whenn running your solution against mine. Your pure-numpy is faster as long as the input array shape is smaller than (100, 100). After that your if-statements on every element consumes more time. But I amit, your solution works great.
@RaJa - I think what you wrote using pandas can be done in pure numpy. Instead of np.where(my != 0) I'd do separate calculation for np.where(my > 0) and np.where(my < 0) and don't call the myFunc at all. Directly call the posFunc and negFunc respectively. Let me edit the code to add that and see how it goes. No need for the 'if at all' then. Thanks for pointing out the issue with reduce.
|
1

So, let's see if I understand your question.

  1. You want to map elements of your matrix to a new matrix such that:
    • 0 maps to 1
    • x>0 maps to funPos(x)
    • x<0 maps to funNeg(x)
  2. You want to calculate the product of all elements in the rows this new matrix.

So, here's how I would go about doing it:

1:

def myFun(a):
    if a==0:
        return 1
    if a>0:
        return funPos(a)
    if a<0:
        return funNeg(a)

newFun = np.vectorize(myFun)
newArray = newFun(myArray)

And for 2:

np.prod(newArray, axis = 1)

Edit: To pass the index to funPos, funNeg, you can probably do something like this:

# Python 2.7
r,c = myArray.shape
ctr = -1       # I don't understand why this should be -1 instead of 0
def myFun(a):
    global ctr
    global c
    ind = ctr % c
    ctr += 1
    if a==0:
        return 1
    if a>0:
        return funPos(a,l[ind])
    if a<0:
        return funNeg(a,l[ind])

7 Comments

That already looks like what I am looking for. But I also have to incorporate the list l (see above) which somehow needs to be passed to "myFun". Do you see an easy way how to incorporate that in your example? I edited my example from above to make clear that funPos,funNeg take not only one argument but actually two: The value of the matrix and the value of l.
I can't think of a trivial way to do this. I'm adding a complex way involving a global list.
Cool, look forward to seeing that :).
Done. flat gives an iterator, and we've basically created an array with the col indices for every number. We pass that to the function
I don't get this yet. How to pass the list l in this case? What do you run afterwards? If I add "newFun = np.vectorize(myFun) newArray = newFun(myArray)" below the code in the Edit, I receive error messages when newArray is created.
|
1

I think this numpy function would be helpful to you

numpy.apply_along_axis

Here is one implementation. Also I would warn against checking if the sum of the array is 0. Comparing floats to 0 can give unexpected behavior due to machine accuracy constraints. Also if you have -5 and 5 the sum is zero and I'm not sure thats what you want. I used numpy's any() function to see if anything was nonzero. For simplicity I also pulled your list (my_list) into global scope.

import numpy as np


my_list = 1.1, 2., 3.4

def func_pos(val1, val2):
    return val1 * val2

def func_neg(val1, val2):
    return val1 *(val2 + 1)


def my_generator(row):
    for i, a in enumerate(row):
        if a > 0:
            yield func_pos(a, my_list[i])
        elif a < 0:
            yield func_neg(a, my_list[i])
        else:
            yield 1


def reduce_row(row):
    if not row.any():
        return 1.0
    else:
        return np.prod(np.fromiter(my_generator(row), dtype=float))


def main():
    myArray = np.array([
            [ 0.        ,  0.        ,  0.        ],
            [ 0.32, -6.79,  0.        ],
            [ 0.        ,  0.        ,  0.        ],
            [ 0.        ,  1.5        ,  0.        ],
            [ 0.        ,  0.        , -1.71]])
    return np.apply_along_axis(reduce_row, axis=1, arr=myArray)

There are probably faster implmentations, I think apply_along_axis is really just a loop under the covers.

I didn't test, but I bet this is faster than what you started with, and should be more memory efficient.

3 Comments

That works perfectly, it seems and thanks for the warning regarding the sum! I upvote it for now and wait with the acceptance for a few days in case some other solution shows up (e.g. an efficient way of vectorization).
I could be wrong but I don't think numpy's vectorize function will help speed up much here due to the logic required for each value while also mapping to a list index. I'll be impressed and pleased to see if I am proven wrong!
Indeed, I could not come up with such a solution either. Maybe shashwat finds one (see his answer).
1

I've tried your example with the masking function of numpy arrays. However, I couldn't find a solution to replace the values in your array by funPos or funNeg.

So my suggestion would be to try this using pandas instead as it conserves indices while masking.

See my example:

import numpy as np
import pandas as pd

def funPos(a, b):
    return a * b
def funNeg(a, b):
    return a * (b + 1)

myPosFunc = np.vectorize(funPos) #vectorized form of funPos
myNegFunc = np.vectorize(funNeg) #vectorized form of funNeg

#Input
I = [1.0, 2.0, 3.0]    
x = pd.DataFrame([
    [ 0.,0.,0.],
    [ 0.32, -6.79,  0.],
    [ 0.,0.,0.],
    [ 0.,1.5,0.],
    [ 0.,0., -1.71]])

b = pd.DataFrame(myPosFunc(x[x>0], I)) #calculate all positive values
c = pd.DataFrame(myNegFunc(x[x<0], I)) #calculate all negative values   
b = b.combineMult(c) #put values of c in b
b = b.fillna(1) #replace all missing values that were '0' in the raw array
y = b.product() #multiply all elements in one row

#Output
print ('final result')
print (y)
print (y.tolist())

6 Comments

As in shashwat's answer, your answer does not take the list l into account. Do you see any way to incorporate that? Thanks!
I've edited my example to implement your list. You just need to recalculate the interim solution b by multiplying the columns with you list.
I might miss something - please correct me if I do - but I still think that this solution does not suit the problem. The calculation of b, c and d does not only depend on x but also on another list which I named l.
No, you are right. I think that I missed that point. Have to think about that.
I've edited my answer to consider your list 'I'. I see that the answer from gabhijit is more elegant, but I assume that using Pandas trumps 'if' cases when dealing with very big arrays.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.