Efficient way to do math in elements of a numpy array according to condition

Question

I am trying to optimize my python code. One of the bottlenecks comes when I tried to apply a function to a numpy array according to each element value. For instance, I have an array with thousand elements and I apply a function for values greater than a tolerance and another function (Taylor series) for the rest. I do masking but still slow, at least I'm calling the following functions for 64 millions times.

EPSILONZETA = 1.0e-6
ZETA1_12 = 1.0/12.0
ZETA1_720 = 1.0/720.0

def masked_condition_zero(array, tolerance):
    """ Return the indices where values are lesser (and greater) than tolerance
    """
    # search indices where array values < tolerance
    indzeros_ = np.where(np.abs(array) < tolerance)[0]

    # create mask
    mask_ = np.ones(np.shape(array), dtype=bool)

    mask_[[indzeros_]] = False

    return (~mask_, mask_) 

def bernoulli_function1(zeta):
    """ Returns the Bernoulli function of zeta, vector version
    """
    # get the indices according to condition
    zeros_, others_ = masked_condition_zero(zeta, EPSILONZETA)

    # create an array filled with zeros
    fb_ = np.zeros(np.shape(zeta))

    # Apply the original function to the values greater than EPSILONZETA
    fb_[others_] = zeta[others_]/(np.exp(zeta[others_])-1.0)  

    # computes series for zeta < eps
    zeta0_ = zeta[zeros_]
    zeta2_ = zeta0_ *  zeta0_
    zeta4_ =  zeta2_ * zeta2_
    fb_[zeros_] = 1.0 - 0.5*zeta0_ + ZETA1_12 * zeta2_ - ZETA1_720 * zeta4_
    return fb_

Now suppose you have an array zeta with negative and positive floats which changes in each loop that extends to 2^26 iterations and you want to compute fbernoulli_function1(zeta) each time.

There is a better solution ?

Is it okay to use nonstandard modules like Numba or Numexpr? — user2379410
– user2379410, Commented May 29, 2015 at 15:14
If that do the work I could try but at first I want to be sure if like I did is the best it can be do it. Moreover, where I run the code, they don't have these modules installed. — Caos21
– Caos21, Commented May 29, 2015 at 15:53
You might be able to remove the need for the Taylor series if you use np.expm1 (docs.scipy.org/doc/numpy/reference/generated/numpy.expm1.html) to compute the function. You'll still have to handle 0 separately, since 0/expm1(0) gives nan, but the value (in the limit) should be 1. There is currently an open pull request at scipy for adding the function exprel(x) = (exp(x)-1)/x: github.com/scipy/scipy/pull/4839; feel free to comment there if you like. — Warren Weckesser
– Warren Weckesser, Commented May 29, 2015 at 16:40
Any advanced indexing, whether with numbers or booleans, makes a copy, and is expensive compared to slices (which produce views). — hpaulj
– hpaulj, Commented May 29, 2015 at 21:34

hpaulj · Accepted Answer · 2015-05-30 02:00:31Z

The basic structure of the problem is:

def foo(zeta):
    result = np.empty_like(zeta)
    I = condition(zeta)
    nI = ~I
    result[I] = func1(zeta[I])
    result[nI] = func2(zeta[nI])

It looks like your polynomial expression can be evaluated at all zeta, but it is the 'exception', the fall back calculation when zeta is too close to 0.

If both functions can be evaluated for a zeta, you could use where:

np.where(condition(zeta), func1(zeta), func2(zeta))

This is streamlined version of:

def foo(zeta):
    result = np.empty_like(zeta)
    I = condition(zeta)
    nI = ~I
    v1 = func1(zeta)
    v2 = func2(zeta)
    result[I] = v1[I]
    result[nI] = v2[nI]

Another option is to apply one function to all values, and the other only to the 'exceptions'.

def foo(zeta):
    result = func2(zeta)
    I = condition(zeta)
    result[I] = func1[zeta[I]]

and of course the reverse - result = func1(zeta); result[nI]=func2[zeta].

In my brief time tests, func1, func2 take about the same time.

masked_condition_zero also takes that time, but the simpler np.abs(array) < tolerance (and it's ~J) cuts that in half.

Lets compare the allocation strategies

def foo(zeta, J, nJ):
    result = np.empty_like(zeta)
    result[J] = fun1(zeta[J])
    result[nJ] = fun2(zeta[nJ])
    return result

For a sample where zeta[J] is 10% of the full zeta, some sample times are:

In [127]: timeit foo(zeta, J, nJ)
10000 loops, best of 3: 55.7 µs per loop

In [128]: timeit result=fun2(zeta); result[J]=fun1(zeta[J])
10000 loops, best of 3: 49.2 µs per loop

In [129]: timeit np.where(J, fun1(zeta),fun2(zeta))
10000 loops, best of 3: 73.4 µs per loop

In [130]: timeit result=fun1(zeta); result[nJ]=fun2(zeta[nJ])
10000 loops, best of 3: 60.7 µs per loop

The second case is fastest because running fun1 on fewer values compensates for the added cost of indexing zeta[J]. There's a trade off between the cost of indexing and the cost of function evaluation. Boolean indexing like this is more expensive than slicing. With other mixes of values, the timings could go the other direction.

It looks like a problem where you can whittle away at the times, but I don't see any break though stategy that would cut times by an order of magnitude.

Yes, in my case the second is better because zeta=0.0 is less probable. I also think that is the better way (and using expm1 to avoid the evaluation of the series), so +1.

Fergal · Accepted Answer · 2015-05-29 23:16:38Z

0

The where command is quite slow compared in indexing into an array. This might be faster.

fb_ = np.zeros_like(zeta)
nonZero= zeta > ZETA_TOLERANCE
zero = ~nonZero
fb_[zero] = function1(zeta[zero])
fb_[nonZero] = function2(zeta[nonZero])

Edit : I realised my original version was making two copies of the same array. This new version should be a bit faster.

edited May 29, 2015 at 23:16

answered May 29, 2015 at 18:26

Fergal

5143 silver badges13 bronze badges

5 Comments

hpaulj Over a year ago

masked_condition_zero returns 2 boolean arrays like yours. So he's doing the same indexing. The use of where in that function might be superfluous, but I doubt if it is a time killer. But that is easy to test.

Caos21 Over a year ago

Using this suggestion and np.expm1 (@WarrenWeckesser), it runs 25% faster: fb_ = np.ones_like(zeta) idx = np.abs(zeta) > EPSILONZETA fb_[idx] = zeta[idx]/np.expm1(zeta[idx])

Fergal Over a year ago

I'm curious how much of that speed up is from my suggestion and how much from @WarrenWeckesser's

Caos21 Over a year ago

@Fergal Using exp - 1.0 is slightly better (14%) in terms of speed but assuming that np.expm1 gives better results around zero (as claimed by the documentation) I prefer to use the second. In fact for z = 2.0E-16, fb(z)= 0.9007.. and 0.999... respectively. At the end a value for the tolerance EPSILONZETA = 1.0e-250 works well.

Caos21 Over a year ago

@Fergal I forgot to mention that as pointed out by @WarrenWeckesser, with using expm1 the only thing left is to catch the "zeros" (adjusted by the tolerance). I am using your approach to do this but the non zeros zero = ~nonZero become useless if you define fb_ = np.ones_like(zeta), then you can be sure that you will have 1 for very small numbers (negatve or positive idx = np.abs(zeta) > EPSILONZETA ).

tal · Accepted Answer · 2015-05-30 02:26:33Z

You can use numba [1] (should be installed if you use anaconda or similar distributions of python) which is a jit compiler aimed to work with numpy.

from numba import jit
@jit
def bernoulli_function_fill(zeta, fb_):
    for i in xrange(len(zeta)):
        if np.abs(zeta[i])>EPSILONZETA:
            fb_[i] = zeta[i]/(np.exp(zeta[i])-1.0)
        else:
            zeta0_ = zeta[i]
            zeta2_ = zeta0_ *  zeta0_
            zeta4_ =  zeta2_ * zeta2_
            fb_[i] = 1.0 - 0.5*zeta0_ + ZETA1_12 * zeta2_ - ZETA1_720 * zeta4_
def bernoulli_function_fast(zeta):
    fb_ = np.zeros_like(zeta)
    bernoulli_function_fill(zeta, fb_)
    return fb_

Note: you can combine the two into the same function if you use new version of numba.

On my machine:

#create some test data
zeta = random.uniform(-1,1, size=2**24)
zeta[random.choice(len(zeta),size=2**23,replace=False )] = EPSILONZETA/2
>>> alltrue(bernoulli_function_fast(zeta)==bernoulli_function1(zeta))
True
>>> %timeit bernoulli_function1(zeta) # your function
1 loops, best of 3: 1.49 s per loop
>>> %timeit bernoulli_function_fast(zeta) #numba function
1 loops, best of 3: 347 ms per loop

which is ~4 times faster and easier to read.

Collectives™ on Stack Overflow

Efficient way to do math in elements of a numpy array according to condition

3 Answers 3

1 Comment

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related