0

I want to perform the following operation on a string and then apply that on an array of strings. I want to speed this operation up using Numba because it will be done on >1 million strings often. I have tried adding the @vectorize decorator to the function but I keep running into errors. Is there anyway to speed this up or will regular python be my only option?

def char_to_code(char):
    asc = ord(char)
    if asc >= 123:
        if asc <= 126:
            asc = asc - 122 + 28
    elif asc >= 97:
        asc = asc - 96 + 300
    elif asc >= 91:
        asc = asc - 90 + 22
    elif asc >= 65:
        asc = asc - 64 + 200
    elif asc >= 58:
        asc = asc - 57 + 15
    elif asc >= 48:
        asc = asc - 47 + 100
    elif asc >= 32:
        asc -= 32
    return asc

EDIT: I wrote the program like this:

from numba import vectorize
import numpy as np

@vectorize
def char_to_code(char):
    asc = ord(char)
    if asc >= 123:
        if asc <= 126:
            asc = asc - 122 + 28
    elif asc >= 97:
        asc = asc - 96 + 300
    elif asc >= 91:
        asc = asc - 90 + 22
    elif asc >= 65:
        asc = asc - 64 + 200
    elif asc >= 58:
        asc = asc - 57 + 15
    elif asc >= 48:
        asc = asc - 47 + 100
    elif asc >= 32:
        asc -= 32
    return asc

char_to_code(np.array(list('bobobobob')))

And got the following error when running the program.

Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "Desktop\ODC 2.0\functions.py", line 771:
@vectorize
def char_to_code(char):
^

  warnings.warn(errors.NumbaDeprecationWarning(msg,
Traceback (most recent call last):
  File "c:/Users/samebhar/Desktop/ODC 2.0/functions.py", line 790, in <module>
    char_to_code('bobobobob')
  File "C:\Users\samebhar\Anaconda3\lib\site-packages\numba\np\ufunc\dufunc.py", line 201, in _compile_for_args
    return self._compile_for_argtys(tuple(argtys))
  File "C:\Users\samebhar\Anaconda3\lib\site-packages\numba\np\ufunc\dufunc.py", line 220, in _compile_for_argtys
    actual_sig = ufuncbuilder._finalize_ufunc_signature(
  File "C:\Users\samebhar\Anaconda3\lib\site-packages\numba\np\ufunc\ufuncbuilder.py", line 157, in _finalize_ufunc_signature
    raise TypeError("return type must be specified for object mode")
TypeError: return type must be specified for object mode
PS C:\Users\samebhar> 

I would like the function to work like a vectorized operation on a Numpy array of characters or directly a string if possible to make it fast. So it should return

[302, 315, 302, 315, 302, 315, 302, 315, 302]

after the vectorized operation is complete on each element of the array.

(ANY OTHER WAY OF SPEEDING THIS UP IS WELCOME)

1
  • Please describe what error you are getting and what behavior you are expecting to occur. Commented Jul 7, 2021 at 20:44

1 Answer 1

1

You can create a look-up table using maketrans():

MAX = 127
dictionary = {chr(code): char_to_code(chr(code)) for code in range(MAX)}
table = str.maketrans(dictionary)

and then use translate():

>>> "Hello, world!".translate(table)
'ÐıĸĸĻ\x0c\x00ŃĻľĸİ\x01'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.