0

I am learning to use numba to accelerate codes in Python. With this code:

from numba import cuda, vectorize
import numpy as np

@cuda.jit(device = True)
def pixel_count(img1,img2):
    count1 = 0
    count2 = 0
    for i in range(img1.shape[0]):
        for j in range(img1.shape[1]):
            if img1[i][j] > 200:
                count1 = count1 + 1
    i = 0; j = 0;
    for i in range(img2.shape[0]):
        for j in range(img2.shape[1]):
            if img2[i][j] > 200:
                count2 = count2 + 1
                         
    return count1, count2


@vectorize(['float32(float32,float32)'], target = 'cuda')
def cint(img1, img2):
    c1, c2 = pixel_count(img1, img2)
    res = c1-c2
    return res

A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255


res = cint(A,B)

I received the following error:

TypingError: No implementation of function Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x00000175A5BE8C70>) found for signature: pixel_count (float32, float32) There are 2 candidate implementations: - Of which 2 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-9169f440975d>: Line 4. With argument(s): '(float32, float32)': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Unknown attribute 'shape' of type float32

 File "<ipython-input-33-9169f440975d>", line 8:
 def pixel_count(img1,img2):
     <source elided>
     count2 = 0
     for i in range(img1.shape[0]):
     ^
 
 During: typing of get attribute at <ipython-input-33-9169f440975d> (8)
 
 File "<ipython-input-33-9169f440975d>", line 8:
 def pixel_count(img1,img2):
     <source elided>
     count2 = 0
     for i in range(img1.shape[0]):
     ^

raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071

During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x00000175A5BE8C70>) During: typing of call at (3)

EDIT

I changed the code like this using guvectorize:

@guvectorize(['(float32[:],float32[:], float32)'], '(), () -> ()',target = 'cuda')
def cint(img1, img2, res):
    c1, c2 = pixel_count(img1, img2)
    res = c1-c2


A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255


res = cint(A, B)

With this error:

TypingError: No implementation of function Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x000001C99C5D42E0>) found for signature: pixel_count (array(float32, 1d, A), array(float32, 1d, A)) There are 2 candidate implementations:

  • Of which 1 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-5b0a51c1200a>: Line
  1. With argument(s): '(array(float32, 1d, A), array(float32, 1d, A))': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Internal error at <numba.core.typeinfer.StaticGetItemConstraint object at 0x000001C99DD239D0>. tuple index out of range During: typing of static-get-item at (9) Enable logging at debug level for details. File "", line 9: def pixel_count(img1,img2): for i in range(img1.shape[0]): for j in range(img1.shape[1]): ^

raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071

  • Of which 1 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-5b0a51c1200a>: Line
  1. With argument(s): '(array(float32, 1d, A), array(float32, 1d, A))': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Internal error at <numba.core.typeinfer.StaticGetItemConstraint object at 0x000001C99DD52370>. tuple index out of range During: typing of static-get-item at (9) Enable logging at debug level for details. File "", line 9: def pixel_count(img1,img2): for i in range(img1.shape[0]): for j in range(img1.shape[1]): ^

raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071

During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x000001C99C5D42E0>) During: typing of call at (23)

How i can use cuda.jit and vectorize/guvectorize function?

EDIT 2

Thank you all for the responses. The goal was to figure out how to solve this task with GPU using numba. Probably the code is faster in CPU being the matrices small size; thank you for the tips on parallel computing very helpful. Do you have any other suggestions on how to port this code to GPU? Thank you very much.

I have modified the code in this way but it always returns the value 0:

from numba import cuda, vectorize, guvectorize
import numpy as np


@cuda.jit(device = True)
def pixel_count(img1,img2):
    count1 = 0
    count2 = 0
    for i in range(img1.shape[0]):
        for j in range(img1.shape[1]):
            if img1[i][j] > 200:
                count1 = count1 + 1
    i = 0; j = 0;
    for i in range(img2.shape[0]):
        for j in range(img2.shape[1]):
            if img2[i][j] > 200:
                count2 = count2 + 1
                         
    return count1, count2

@guvectorize(['(float32[:,:],float32[:,:], int16)'],
             '(n,m), (n,m)-> ()', target = 'cuda')
def cint(img1, img2, res):
    count1, count2 = pixel_count(img1, img2)
    res = count1 - count2

A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255
res1 = cint(A, B)
4
  • Obviously calling a function which expects an array as an input is not going to work for vectorize, where the functions work on scalars. Commented Jun 3, 2021 at 13:29
  • Again, your new version has two dimensional indexing in the device function, but the guvectorize specifies one dimensional arrays. In both cases the compiler is literally telling you what the problem is, if you take the time to study the error messages Commented Jun 3, 2021 at 15:03
  • Remember, there is no duck typing here. Everything must be explicitly typed and dimensioned and match. The compiler is statically typing everything. Commented Jun 3, 2021 at 15:10
  • Do you really want to go through this hassle for something that can be solved with (A > 200).sum() - (B > 200).sum(). That expression is already vectorized. If you wish to rely on CUDA, you may wish to decorate that expression with Numba or to explore CuPy. Commented Jun 3, 2021 at 16:37

1 Answer 1

1

Not using CUDA, but this may give you some ideas:

Pure Numpy (already vectorized):

A = np.random.rand(480, 640).astype(np.float32) * 255
B = np.random.rand(480, 640).astype(np.float32) * 255

%timeit (A > 200).sum() - (B > 200).sum()
478 µs ± 4.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Simply wrapping the numpy operations in a JITted function:

@nb.njit
def pixel_count_jit(img):
    return (img > 200).sum()

%timeit pixel_count_jit(A) - pixel_count_jit(B)
165 µs ± 13.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Parallelizing with Numba by rows:

@nb.njit(parallel=True)
def pixel_count_parallel(img):
    counts = np.empty(img.shape[1], dtype=nb.uint32)
    for i in nb.prange(img.shape[0]):
        counts[i] = (img[i] > 200).sum()
    return counts.sum()

%timeit pixel_count_parallel(A) - pixel_count_parallel(B)
28.5 µs ± 571 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.