I want to perform the following operation on a string and then apply that on an array of strings. I want to speed this operation up using Numba because it will be done on >1 million strings often. I have tried adding the @vectorize decorator to the function but I keep running into errors. Is there anyway to speed this up or will regular python be my only option?
def char_to_code(char):
asc = ord(char)
if asc >= 123:
if asc <= 126:
asc = asc - 122 + 28
elif asc >= 97:
asc = asc - 96 + 300
elif asc >= 91:
asc = asc - 90 + 22
elif asc >= 65:
asc = asc - 64 + 200
elif asc >= 58:
asc = asc - 57 + 15
elif asc >= 48:
asc = asc - 47 + 100
elif asc >= 32:
asc -= 32
return asc
EDIT: I wrote the program like this:
from numba import vectorize
import numpy as np
@vectorize
def char_to_code(char):
asc = ord(char)
if asc >= 123:
if asc <= 126:
asc = asc - 122 + 28
elif asc >= 97:
asc = asc - 96 + 300
elif asc >= 91:
asc = asc - 90 + 22
elif asc >= 65:
asc = asc - 64 + 200
elif asc >= 58:
asc = asc - 57 + 15
elif asc >= 48:
asc = asc - 47 + 100
elif asc >= 32:
asc -= 32
return asc
char_to_code(np.array(list('bobobobob')))
And got the following error when running the program.
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.
For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit
File "Desktop\ODC 2.0\functions.py", line 771:
@vectorize
def char_to_code(char):
^
warnings.warn(errors.NumbaDeprecationWarning(msg,
Traceback (most recent call last):
File "c:/Users/samebhar/Desktop/ODC 2.0/functions.py", line 790, in <module>
char_to_code('bobobobob')
File "C:\Users\samebhar\Anaconda3\lib\site-packages\numba\np\ufunc\dufunc.py", line 201, in _compile_for_args
return self._compile_for_argtys(tuple(argtys))
File "C:\Users\samebhar\Anaconda3\lib\site-packages\numba\np\ufunc\dufunc.py", line 220, in _compile_for_argtys
actual_sig = ufuncbuilder._finalize_ufunc_signature(
File "C:\Users\samebhar\Anaconda3\lib\site-packages\numba\np\ufunc\ufuncbuilder.py", line 157, in _finalize_ufunc_signature
raise TypeError("return type must be specified for object mode")
TypeError: return type must be specified for object mode
PS C:\Users\samebhar>
I would like the function to work like a vectorized operation on a Numpy array of characters or directly a string if possible to make it fast. So it should return
[302, 315, 302, 315, 302, 315, 302, 315, 302]
after the vectorized operation is complete on each element of the array.
(ANY OTHER WAY OF SPEEDING THIS UP IS WELCOME)