I am learning to use numba to accelerate codes in Python. With this code:
from numba import cuda, vectorize
import numpy as np
@cuda.jit(device = True)
def pixel_count(img1,img2):
count1 = 0
count2 = 0
for i in range(img1.shape[0]):
for j in range(img1.shape[1]):
if img1[i][j] > 200:
count1 = count1 + 1
i = 0; j = 0;
for i in range(img2.shape[0]):
for j in range(img2.shape[1]):
if img2[i][j] > 200:
count2 = count2 + 1
return count1, count2
@vectorize(['float32(float32,float32)'], target = 'cuda')
def cint(img1, img2):
c1, c2 = pixel_count(img1, img2)
res = c1-c2
return res
A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255
res = cint(A,B)
I received the following error:
TypingError: No implementation of function Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x00000175A5BE8C70>) found for signature: pixel_count (float32, float32) There are 2 candidate implementations: - Of which 2 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-9169f440975d>: Line 4. With argument(s): '(float32, float32)': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Unknown attribute 'shape' of type float32
File "<ipython-input-33-9169f440975d>", line 8: def pixel_count(img1,img2): <source elided> count2 = 0 for i in range(img1.shape[0]): ^ During: typing of get attribute at <ipython-input-33-9169f440975d> (8) File "<ipython-input-33-9169f440975d>", line 8: def pixel_count(img1,img2): <source elided> count2 = 0 for i in range(img1.shape[0]): ^raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071
During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x00000175A5BE8C70>) During: typing of call at (3)
EDIT
I changed the code like this using guvectorize:
@guvectorize(['(float32[:],float32[:], float32)'], '(), () -> ()',target = 'cuda')
def cint(img1, img2, res):
c1, c2 = pixel_count(img1, img2)
res = c1-c2
A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255
res = cint(A, B)
With this error:
TypingError: No implementation of function Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x000001C99C5D42E0>) found for signature: pixel_count (array(float32, 1d, A), array(float32, 1d, A)) There are 2 candidate implementations:
- Of which 1 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-5b0a51c1200a>: Line
- With argument(s): '(array(float32, 1d, A), array(float32, 1d, A))': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Internal error at <numba.core.typeinfer.StaticGetItemConstraint object at 0x000001C99DD239D0>. tuple index out of range During: typing of static-get-item at (9) Enable logging at debug level for details. File "", line 9: def pixel_count(img1,img2): for i in range(img1.shape[0]): for j in range(img1.shape[1]): ^
raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071
- Of which 1 did not match due to: Overload in function 'pixel_count ': File: ..........\OneDrive\Desktop<ipython-input-33-5b0a51c1200a>: Line
- With argument(s): '(array(float32, 1d, A), array(float32, 1d, A))': Rejected as the implementation raised a specific error: TypingError: Failed in nopython mode pipeline (step: nopython frontend) Internal error at <numba.core.typeinfer.StaticGetItemConstraint object at 0x000001C99DD52370>. tuple index out of range During: typing of static-get-item at (9) Enable logging at debug level for details. File "", line 9: def pixel_count(img1,img2): for i in range(img1.shape[0]): for j in range(img1.shape[1]): ^
raised from C:\Users\giuli\anaconda3\envs\GPUcomp\lib\site-packages\numba\core\typeinfer.py:1071
During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x000001C99C5D42E0>) During: typing of call at (23)
How i can use cuda.jit and vectorize/guvectorize function?
EDIT 2
Thank you all for the responses. The goal was to figure out how to solve this task with GPU using numba. Probably the code is faster in CPU being the matrices small size; thank you for the tips on parallel computing very helpful. Do you have any other suggestions on how to port this code to GPU? Thank you very much.
I have modified the code in this way but it always returns the value 0:
from numba import cuda, vectorize, guvectorize
import numpy as np
@cuda.jit(device = True)
def pixel_count(img1,img2):
count1 = 0
count2 = 0
for i in range(img1.shape[0]):
for j in range(img1.shape[1]):
if img1[i][j] > 200:
count1 = count1 + 1
i = 0; j = 0;
for i in range(img2.shape[0]):
for j in range(img2.shape[1]):
if img2[i][j] > 200:
count2 = count2 + 1
return count1, count2
@guvectorize(['(float32[:,:],float32[:,:], int16)'],
'(n,m), (n,m)-> ()', target = 'cuda')
def cint(img1, img2, res):
count1, count2 = pixel_count(img1, img2)
res = count1 - count2
A = np.random.rand(480, 640).astype(np.float32)*255
B = np.random.rand(480, 640).astype(np.float32)*255
res1 = cint(A, B)
guvectorizespecifies one dimensional arrays. In both cases the compiler is literally telling you what the problem is, if you take the time to study the error messages(A > 200).sum() - (B > 200).sum(). That expression is already vectorized. If you wish to rely on CUDA, you may wish to decorate that expression with Numba or to explore CuPy.