6

I need to decode, with Python 3, a string that was encoded the following way:

>>> s = numpy.asarray(numpy.string_("hello\nworld"))
>>> s
array(b'hello\nworld', 
      dtype='|S11')

I tried:

>>> str(s)
"b'hello\\nworld'"

>>> s.decode()
AttributeError                            Traceback (most recent call last)
<ipython-input-31-7f8dd6e0676b> in <module>()
----> 1 s.decode()

AttributeError: 'numpy.ndarray' object has no attribute 'decode'

>>> s[0].decode()
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-34-fae1dad6938f> in <module>()
----> 1 s[0].decode()

IndexError: 0-d arrays can't be indexed
0

3 Answers 3

3

Another option is the np.char collection of string operations.

In [255]: np.char.decode(s)
Out[255]: 
array('hello\nworld', 
      dtype='<U11')

It accepts the encoding keyword if needed. But .astype is probably better if you don't need this.

This s is 0d (shape ()), so needs to be indexed with s[()].

In [268]: s[()]
Out[268]: b'hello\nworld'
In [269]: s[()].decode()
Out[269]: 'hello\nworld'

s.item() also works.

Sign up to request clarification or add additional context in comments.

Comments

1

In Python 3, there are two types that represent sequences of characters: bytes and str (contain Unicode characters). When you use string_ as your type, numpy will return bytes. If you want the regular str you should use unicode_ type in numpy:

>>> s = numpy.asarray(numpy.unicode_("hello\nworld"))
>>> s
array('hello\nworld', 
      dtype='<U11')

>>> str(s)
'hello\nworld'

But note that if you don't specify a type for your string (string_ or unicode_) it will return the default str type (which in python 3.x is the str (contain the unicode characters)).

>>> s = numpy.asarray("hello\nworld")
>>> str(s)
'hello\nworld'

3 Comments

The reason why I encode with numpy.string_ data is for compatibility. My data goes to a data format called HDF5, and can be potentially read back by other software than just python.
@PiRK If you want a compatible approach between python versions you should just use numpy.asarray() otherwise it has nothing to do with python.
Unfortunately I also need my output HDF5 files to be compatible with old Fortran libraries, various versions of the Octave software, Matlab... etc
1

If my understanding is correct, you can do this with astype which, if copy = False will return the array with the contents in the corresponding type:

>>> s = numpy.asarray(numpy.string_("hello\nworld"))
>>> r = s.astype(str, copy=False)
>>> r 
array('hello\nworld', 
      dtype='<U11')

3 Comments

Thanks! This helps a lot. Now I can recover my string this way: s = str(s.astype(str))
You don't need to convert the type when you can get the regular str directly with unicode_.
I don't control the encoding stage. In my real-world problem, I don't create s myself. I just happen to know that it was written to a file after this encoding stage.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.