Converting array of strings to 2D array of codes sped up by Numba

Question

I want to perform the following operation on a string and then apply that on an array of strings. I want to speed this operation up using Numba because it will be done on >1 million strings often. I have tried adding the @vectorize decorator to the function but I keep running into errors. Is there anyway to speed this up or will regular python be my only option?

def char_to_code(char):
    asc = ord(char)
    if asc >= 123:
        if asc <= 126:
            asc = asc - 122 + 28
    elif asc >= 97:
        asc = asc - 96 + 300
    elif asc >= 91:
        asc = asc - 90 + 22
    elif asc >= 65:
        asc = asc - 64 + 200
    elif asc >= 58:
        asc = asc - 57 + 15
    elif asc >= 48:
        asc = asc - 47 + 100
    elif asc >= 32:
        asc -= 32
    return asc

EDIT: I wrote the program like this:

from numba import vectorize
import numpy as np

@vectorize
def char_to_code(char):
    asc = ord(char)
    if asc >= 123:
        if asc <= 126:
            asc = asc - 122 + 28
    elif asc >= 97:
        asc = asc - 96 + 300
    elif asc >= 91:
        asc = asc - 90 + 22
    elif asc >= 65:
        asc = asc - 64 + 200
    elif asc >= 58:
        asc = asc - 57 + 15
    elif asc >= 48:
        asc = asc - 47 + 100
    elif asc >= 32:
        asc -= 32
    return asc

char_to_code(np.array(list('bobobobob')))

And got the following error when running the program.

Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "Desktop\ODC 2.0\functions.py", line 771:
@vectorize
def char_to_code(char):
^

  warnings.warn(errors.NumbaDeprecationWarning(msg,
Traceback (most recent call last):
  File "c:/Users/samebhar/Desktop/ODC 2.0/functions.py", line 790, in <module>
    char_to_code('bobobobob')
  File "C:\Users\samebhar\Anaconda3\lib\site-packages\numba\np\ufunc\dufunc.py", line 201, in _compile_for_args
    return self._compile_for_argtys(tuple(argtys))
  File "C:\Users\samebhar\Anaconda3\lib\site-packages\numba\np\ufunc\dufunc.py", line 220, in _compile_for_argtys
    actual_sig = ufuncbuilder._finalize_ufunc_signature(
  File "C:\Users\samebhar\Anaconda3\lib\site-packages\numba\np\ufunc\ufuncbuilder.py", line 157, in _finalize_ufunc_signature
    raise TypeError("return type must be specified for object mode")
TypeError: return type must be specified for object mode
PS C:\Users\samebhar>

I would like the function to work like a vectorized operation on a Numpy array of characters or directly a string if possible to make it fast. So it should return

[302, 315, 302, 315, 302, 315, 302, 315, 302]

after the vectorized operation is complete on each element of the array.

(ANY OTHER WAY OF SPEEDING THIS UP IS WELCOME)

Please describe what error you are getting and what behavior you are expecting to occur. — hyperneutrino
– hyperneutrino, Commented Jul 7, 2021 at 20:44

aerobiomat · Accepted Answer · 2021-07-08 08:34:57Z

1

You can create a look-up table using maketrans():

MAX = 127
dictionary = {chr(code): char_to_code(chr(code)) for code in range(MAX)}
table = str.maketrans(dictionary)

and then use translate():

>>> "Hello, world!".translate(table)
'ÐıĸĸĻ\x0c\x00ŃĻľĸİ\x01'

answered Jul 8, 2021 at 8:34

aerobiomat

3,4371 gold badge18 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Converting array of strings to 2D array of codes sped up by Numba

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related