The module numpy is an excellent tool for memory-efficient storage of python objects, among them strings. For ANSI strings in numpy arrays only 1 byte per character is used.
However, there is one inconvenience. The type of stored objects is no more string but bytes, which means that have to be decoded for further use in most cases, which in turn means quite bulky code:
>>> import numpy
>>> my_array = numpy.array(['apple', 'pear'], dtype = 'S5')
>>> print("Mary has an {} and a {}".format(my_array[0], my_array[1]))
Mary has an b'apple' and a b'pear'
>>> print("Mary has an {} and a {}".format(my_array[0].decode('utf-8'),
... my_array[1].decode('utf-8')))
Mary has an apple and a pear
This inconvenience can be eliminated by using another data type, e.g:
>>> my_array = numpy.array(['apple', 'pear'], dtype = 'U5')
>>> print("Mary has an {} and a {}".format(my_array[0], my_array[1]))
Mary has an apple and a pear
However, this is achieved only by cost of 4-fold increase in memory usage:
>>> numpy.info(my_array)
class: ndarray
shape: (2,)
strides: (20,)
itemsize: 20
aligned: True
contiguous: True
fortran: True
data pointer: 0x1a5b020
byteorder: little
byteswap: False
type: <U5
Is there a solution that combines advantages of both efficient memory allocation and convenient usage for ANSI strings?
b.