6

I am trying to optimize my python code. One of the bottlenecks comes when I tried to apply a function to a numpy array according to each element value. For instance, I have an array with thousand elements and I apply a function for values greater than a tolerance and another function (Taylor series) for the rest. I do masking but still slow, at least I'm calling the following functions for 64 millions times.

EPSILONZETA = 1.0e-6
ZETA1_12 = 1.0/12.0
ZETA1_720 = 1.0/720.0

def masked_condition_zero(array, tolerance):
    """ Return the indices where values are lesser (and greater) than tolerance
    """
    # search indices where array values < tolerance
    indzeros_ = np.where(np.abs(array) < tolerance)[0]

    # create mask
    mask_ = np.ones(np.shape(array), dtype=bool)

    mask_[[indzeros_]] = False

    return (~mask_, mask_) 

def bernoulli_function1(zeta):
    """ Returns the Bernoulli function of zeta, vector version
    """
    # get the indices according to condition
    zeros_, others_ = masked_condition_zero(zeta, EPSILONZETA)

    # create an array filled with zeros
    fb_ = np.zeros(np.shape(zeta))

    # Apply the original function to the values greater than EPSILONZETA
    fb_[others_] = zeta[others_]/(np.exp(zeta[others_])-1.0)  

    # computes series for zeta < eps
    zeta0_ = zeta[zeros_]
    zeta2_ = zeta0_ *  zeta0_
    zeta4_ =  zeta2_ * zeta2_
    fb_[zeros_] = 1.0 - 0.5*zeta0_ + ZETA1_12 * zeta2_ - ZETA1_720 * zeta4_
    return fb_

Now suppose you have an array zeta with negative and positive floats which changes in each loop that extends to 2^26 iterations and you want to compute fbernoulli_function1(zeta) each time.

There is a better solution ?

7
  • Is it okay to use nonstandard modules like Numba or Numexpr? Commented May 29, 2015 at 15:14
  • If that do the work I could try but at first I want to be sure if like I did is the best it can be do it. Moreover, where I run the code, they don't have these modules installed. Commented May 29, 2015 at 15:53
  • @moarningsun By the way, nice suggestion. Commented May 29, 2015 at 15:59
  • 2
    You might be able to remove the need for the Taylor series if you use np.expm1 (docs.scipy.org/doc/numpy/reference/generated/numpy.expm1.html) to compute the function. You'll still have to handle 0 separately, since 0/expm1(0) gives nan, but the value (in the limit) should be 1. There is currently an open pull request at scipy for adding the function exprel(x) = (exp(x)-1)/x: github.com/scipy/scipy/pull/4839; feel free to comment there if you like. Commented May 29, 2015 at 16:40
  • Any advanced indexing, whether with numbers or booleans, makes a copy, and is expensive compared to slices (which produce views). Commented May 29, 2015 at 21:34

3 Answers 3

2

The basic structure of the problem is:

def foo(zeta):
    result = np.empty_like(zeta)
    I = condition(zeta)
    nI = ~I
    result[I] = func1(zeta[I])
    result[nI] = func2(zeta[nI])

It looks like your polynomial expression can be evaluated at all zeta, but it is the 'exception', the fall back calculation when zeta is too close to 0.

If both functions can be evaluated for a zeta, you could use where:

np.where(condition(zeta), func1(zeta), func2(zeta))

This is streamlined version of:

def foo(zeta):
    result = np.empty_like(zeta)
    I = condition(zeta)
    nI = ~I
    v1 = func1(zeta)
    v2 = func2(zeta)
    result[I] = v1[I]
    result[nI] = v2[nI]

Another option is to apply one function to all values, and the other only to the 'exceptions'.

def foo(zeta):
    result = func2(zeta)
    I = condition(zeta)
    result[I] = func1[zeta[I]]

and of course the reverse - result = func1(zeta); result[nI]=func2[zeta].

In my brief time tests, func1, func2 take about the same time.

masked_condition_zero also takes that time, but the simpler np.abs(array) < tolerance (and it's ~J) cuts that in half.

Lets compare the allocation strategies

def foo(zeta, J, nJ):
    result = np.empty_like(zeta)
    result[J] = fun1(zeta[J])
    result[nJ] = fun2(zeta[nJ])
    return result

For a sample where zeta[J] is 10% of the full zeta, some sample times are:

In [127]: timeit foo(zeta, J, nJ)
10000 loops, best of 3: 55.7 µs per loop

In [128]: timeit result=fun2(zeta); result[J]=fun1(zeta[J])
10000 loops, best of 3: 49.2 µs per loop

In [129]: timeit np.where(J, fun1(zeta),fun2(zeta))
10000 loops, best of 3: 73.4 µs per loop

In [130]: timeit result=fun1(zeta); result[nJ]=fun2(zeta[nJ])
10000 loops, best of 3: 60.7 µs per loop

The second case is fastest because running fun1 on fewer values compensates for the added cost of indexing zeta[J]. There's a trade off between the cost of indexing and the cost of function evaluation. Boolean indexing like this is more expensive than slicing. With other mixes of values, the timings could go the other direction.

It looks like a problem where you can whittle away at the times, but I don't see any break though stategy that would cut times by an order of magnitude.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, in my case the second is better because zeta=0.0 is less probable. I also think that is the better way (and using expm1 to avoid the evaluation of the series), so +1.
0

The where command is quite slow compared in indexing into an array. This might be faster.

fb_ = np.zeros_like(zeta)
nonZero= zeta > ZETA_TOLERANCE
zero = ~nonZero
fb_[zero] = function1(zeta[zero])
fb_[nonZero] = function2(zeta[nonZero])

Edit : I realised my original version was making two copies of the same array. This new version should be a bit faster.

5 Comments

masked_condition_zero returns 2 boolean arrays like yours. So he's doing the same indexing. The use of where in that function might be superfluous, but I doubt if it is a time killer. But that is easy to test.
Using this suggestion and np.expm1 (@WarrenWeckesser), it runs 25% faster: fb_ = np.ones_like(zeta) idx = np.abs(zeta) > EPSILONZETA fb_[idx] = zeta[idx]/np.expm1(zeta[idx])
I'm curious how much of that speed up is from my suggestion and how much from @WarrenWeckesser's
@Fergal Using exp - 1.0 is slightly better (14%) in terms of speed but assuming that np.expm1 gives better results around zero (as claimed by the documentation) I prefer to use the second. In fact for z = 2.0E-16, fb(z)= 0.9007.. and 0.999... respectively. At the end a value for the tolerance EPSILONZETA = 1.0e-250 works well.
@Fergal I forgot to mention that as pointed out by @WarrenWeckesser, with using expm1 the only thing left is to catch the "zeros" (adjusted by the tolerance). I am using your approach to do this but the non zeros zero = ~nonZero become useless if you define fb_ = np.ones_like(zeta), then you can be sure that you will have 1 for very small numbers (negatve or positive idx = np.abs(zeta) > EPSILONZETA ).
0

You can use numba [1] (should be installed if you use anaconda or similar distributions of python) which is a jit compiler aimed to work with numpy.

from numba import jit
@jit
def bernoulli_function_fill(zeta, fb_):
    for i in xrange(len(zeta)):
        if np.abs(zeta[i])>EPSILONZETA:
            fb_[i] = zeta[i]/(np.exp(zeta[i])-1.0)
        else:
            zeta0_ = zeta[i]
            zeta2_ = zeta0_ *  zeta0_
            zeta4_ =  zeta2_ * zeta2_
            fb_[i] = 1.0 - 0.5*zeta0_ + ZETA1_12 * zeta2_ - ZETA1_720 * zeta4_
def bernoulli_function_fast(zeta):
    fb_ = np.zeros_like(zeta)
    bernoulli_function_fill(zeta, fb_)
    return fb_

Note: you can combine the two into the same function if you use new version of numba.

On my machine:

#create some test data
zeta = random.uniform(-1,1, size=2**24)
zeta[random.choice(len(zeta),size=2**23,replace=False )] = EPSILONZETA/2
>>> alltrue(bernoulli_function_fast(zeta)==bernoulli_function1(zeta))
True
>>> %timeit bernoulli_function1(zeta) # your function
1 loops, best of 3: 1.49 s per loop
>>> %timeit bernoulli_function_fast(zeta) #numba function
1 loops, best of 3: 347 ms per loop

which is ~4 times faster and easier to read.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.