1

I have a small question on string conversion in python3.

s = '\x001\x002\x001\x000\x005\x005\x000\x004\x000\x000\x00'

print(s) -> gives the output :

1 2 1 0 5 5 0 4 0 0

However, when I try to convert the string using the following:

bytes(s, 'utf16').decode('utf16') , I get '\x001\x002\x001\x000\x005\x005\x000\x004\x000\x000\x00'.

What is the way to get the same output as print(s) programmatically?

3
  • 1
    BTW: the second variant it is wrong. It is not UTF16 (but UTF16BE) Commented Jun 18, 2020 at 15:49
  • It seems utf-16 works as well. realpython.com/python-encodings-guide Commented Jun 18, 2020 at 16:25
  • Interesting, I wouldn't have expected '\x00' to print as a blank. Commented Jun 18, 2020 at 16:44

2 Answers 2

2

On first example, you print the string s, and console will ignore the \x00. You do a print(s).

On you last line, you get the string from python prompt. If you print it: print(bytes(s,'utf-16').decode('utf-16')), you get what you want.

So Python prompt show you to the variable, with context (e.g. you see also the ' signs), but not the real representation of the string (which do you have with print).

ADDENDUM:

print will print the string in its argument, eventually calling str() to convert the argument to string. But python prompt will print the representation of the variable (given with repr(). So you can print(repr(bytes(s,'utf-16').decode('utf-16'))) to get the same string you get in python interactive session, but as string. Instead of printing, you can assign such function (r = repr(bytes(...).decode(...)), so you have r[0] is ', r[1] is \, etc.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks @Giacomo for the answer. I wanted to perform operation on the string that comes on printing it, via print(str(s)). Is there a way to store the print output to a variable or some better way?
str(s) is a no-op if s is already a string. What Python prints at the console if you don't use print is actually repr(s).
@MarkRansom Right. It is a leftover of a draft. I wanted to point str with repr, but it was also not so useful. I'll edit away the str
@sk1pro99: I added about how to get the string in the second case. Exact output in first case (but also second case) is difficult, it depends on the console (and font). \x00 has not meaning in Unicode, so it is not printed, but other control character could be interpreted differently. Then all combining characters (and code points). repr() avoid control characters and difficult characters by escaping them.. And a warning: Your code may work in Unix/Macos (now Unicode by default), but on Windows you may get different output.
1

You just need to decode this binary and you will get the answer

x = b'\x001\x002\x001\x000\x005\x005\x000\x004\x000\x000\x00'
str1 = x.decode('utf-8')
print(" ".join([i for i in str1 if ord(i) != 0]))

Second Solution:

x = '1 2 1 0 5 5 0 4 0 0'
str_utf32 = x.encode('utf16')
print("Encoding :",str_utf32)
print("Decoding :",str_utf32.decode('utf16'))

output

Encoding : b'\xff\xfe1\x00 \x002\x00 \x001\x00 \x000\x00 \x005\x00 \x005\x00 \x000\x00 \x004\x00 \x000\x00 \x000\x00'
Decoding : 1 2 1 0 5 5 0 4 0 0

1 Comment

How to you suggest to get s to x? x = bytes(s, 'utf-8'). If I then do bytes(x, 'utf-8').decode('utf-8'), I get the same string '\x001\x002\x001\x000\x005\x005\x000\x004\x000\x000\x00'. Although if I use print, I get the converted string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.