What does the 'b' character do in front of a string literal?

Question

Apparently, the following is the valid syntax:

b'The string'

I would like to know:

What does this b character in front of the string mean?
What are the effects of using it?
What are appropriate situations to use it?

I found a related question right here on SO, but that question is about PHP though, and it states the b is used to indicate the string is binary, as opposed to Unicode, which was needed for code to be compatible from version of PHP < 6, when migrating to PHP 6. I don't think this applies to Python.

I did find this documentation on the Python site about using a u character in the same syntax to specify a string as Unicode. Unfortunately, it doesn't mention the b character anywhere in that document.

Also, just out of curiosity, are there more symbols than the b and u that do other things?

For the curiosity part, since python 3.6 there are the f-strings which are really useful. You can do: v = "world" print(f"Hello {v}") getting "Hello world". Another example is f"{2 * 5}" which gives you "10". It is the way forward when working with strings. — thanos.a
– thanos.a, Commented Mar 23, 2021 at 9:13
f-Strings also have a handy debugging feature if you add an equals (=) sign after the variable but before the closing brace, so f'{v=}' would output "v=123" as the string, showing the name of whatever is being printed. Even for expressions, so f'{2*5=}' would print out "2*5=10" — diamondsea
– diamondsea, Commented Apr 13, 2022 at 17:22
For the curiosity part: stringprefix::= "r" | "u" | "R" | "U" | "f" | "F" | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF" bytesprefix::= "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB" Documentation: String and Bytes literals — AcK
– AcK, Commented Apr 16, 2022 at 12:42

kev555 · Accepted Answer · 2025-04-24 15:46:43Z

1267

Python 3.x makes a clear distinction between the types:

str = '...' literals = a sequence of characters. A “character” is a basic unit of text: a letter, digit, punctuation mark, symbol, space, or “control character” (like tab or backspace). The Unicode standard assigns each character to an integer code point between 0 and 0x10FFFF. (Well, more or less. Unicode includes ligatures and combining characters, so a string might not have the same number of code points as user-perceived characters.) Internally, str uses a flexible string representation that can use either 1, 2, or 4 bytes per code point.
bytes = b'...' literals = a sequence of bytes. A “byte” is the smallest integer type addressable on a computer, which is nearly universally an octet, or 8-bit unit, thus allowing numbers between 0 and 255.

If you're familiar with:

Java or C#, think of str as String and bytes as byte[];
SQL, think of str as NVARCHAR and bytes as BINARY or BLOB;
Windows registry, think of str as REG_SZ and bytes as REG_BINARY.

If you're familiar with C(++), then forget everything you've learned about char and strings, because a character is not a byte. That idea is long obsolete.

You use str when you want to represent text.

print('שלום עולם')

You use bytes when you want to represent low-level binary data like structs.

NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]

You can encode a str to a bytes object.

>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'

And you can decode a bytes into a str.

>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'

But you can't freely mix the two types.

>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The b'...' notation is somewhat confusing in that it allows the printable range of ASCII characters i.e. character code 32 - 126 (space " " up to tilde "~") to be used as a shorthand directly instead of their hex values (0x20 up to 0x7E)

>>> b'A' == b'\x41'
True

But I must emphasize, a character is not a byte.

>>> 'A' == b'A'
False

In Python 2.x

Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:

unicode = u'...' literals = sequence of Unicode characters = 3.x str
str = '...' literals = sequences of confounded bytes/characters
- Usually text, encoded in some unspecified encoding.
- But also used to represent binary data like struct.pack output.

In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x.

So yes, b'...' literals in Python have the same purpose that they do in PHP.

Also, just out of curiosity, are there more symbols than the b and u that do other things?

The r prefix creates a raw string (e.g., r'\t' is a backslash + t instead of a tab), and triple quotes '''...''' or """...""" allow multi-line string literals.

The f prefix (introduced in Python 3.6) creates a “formatted string literal” which can reference Python variables. For example, f'My name is {name}.' is shorthand for 'My name is {0}.'.format(name).

edited Apr 24 at 15:46

kev555

271 silver badge6 bronze badges

answered Jun 8, 2011 at 2:34

dan04

92k23 gold badges169 silver badges206 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

tommy.carstensen Over a year ago

Thanks! I understood it after reading these sentences: "In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x."

Wildcard Over a year ago

The 'A' == b'A' --> False check really makes it clear. The rest of it is excellent, but up to that point I hadn't properly understood that a byte string is not really text.

Eli Over a year ago

'שלום עולם' == 'hello world'

Marvin Thobejane Over a year ago

b"some string".decode('UTF-8'), I believe that's the line many are looking for

Conchylicultor Over a year ago

In addition of u, b, r, Python 3.6, introduce f-string for string formatting. Example f'The temperature is {tmp_value} Celsius'

|

anthony sottile · Accepted Answer · 2018-11-07 03:46:20Z

548

To quote the Python 2.x documentation:

A prefix of 'b' or 'B' is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A 'u' or 'b' prefix may be followed by an 'r' prefix.

The Python 3 documentation states:

Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

edited Nov 7, 2018 at 3:46

anthony sottile

72.1k19 gold badges199 silver badges248 bronze badges

answered Jun 7, 2011 at 18:16

NPE

503k114 gold badges970 silver badges1k bronze badges

5 Comments

Jesse Webb Over a year ago

So it sounds like Python < v3 will just ignore this extra character. What would be a case in v3 where you would need to use a b string as opposed to just a regular string?

detly Over a year ago

@Gweebz - if you're actually typing out a string in a particular encoding instead of with unicode escapes (eg. b'\xff\xfe\xe12' instead of '\u32e1').

Romuald Brunet Over a year ago

Actually, if you've imported unicode_literals from __future__, this will "reverse" the behavior for this particular string (in Python 2.x)

Hack-R Over a year ago

A little more plain language narrative around the quoted documentation would make this a better answer IMHO

smci Over a year ago

"b is for bytes(/ASCII), as opposed to Unicode. In Python 3.x, strings are now Unicode by default." do we agree that suggested doc change is better? Also, that 3.x doc quote assumes you already know strings are now Unicode by default, without actually saying that. Also, 2.x is now ancient history, I'd move the 3.x quote above it (and mentions of 2to3 are pretty ancient too).

user774340 · Accepted Answer · 2011-06-07 18:34:03Z

45

The b denotes a byte string.

Bytes are the actual data. Strings are an abstraction.

If you had multi-character string object and you took a single character, it would be a string, and it might be more than 1 byte in size depending on encoding.

If took 1 byte with a byte string, you'd get a single 8-bit value from 0-255 and it might not represent a complete character if those characters due to encoding were > 1 byte.

TBH I'd use strings unless I had some specific low level reason to use bytes.

answered Jun 7, 2011 at 18:34

user774340

Comments

Eliahu Aaron · Accepted Answer · 2019-08-28 10:30:18Z

33

From server side, if we send any response, it will be sent in the form of byte type, so it will appear in the client as b'Response from server'

In order get rid of b'....' simply use below code:

Server file:

stri="Response from server"    
c.send(stri.encode())

Client file:

print(s.recv(1024).decode())

then it will print Response from server

edited Aug 28, 2019 at 10:30

Eliahu Aaron

4,6425 gold badges32 silver badges43 bronze badges

answered Aug 17, 2018 at 7:27

Mark 25

5215 silver badges4 bronze badges

3 Comments

Chandra Kanth Over a year ago

It doesn't explain the question that Jesse Webb has asked!

Michael Erickson Over a year ago

Actually this is exactly the answer to the title of the question that was asked: Q: "What does b'x' do?" A: "It does 'x'.encode()" That is literally what it does. The rest of the question wanted to know much more than this, but the title is answered.

Karl Knechtel Over a year ago

@MichaelErickson no, b'x' does not "do 'x'.encode(). It simply creates a value of the same type. If you don't believe me, try evaluating b'\u1000' == '\u1000'.encode().

Marcello DeSales · Accepted Answer · 2021-02-14 01:07:10Z

29

The answer to the question is that, it does:

data.encode()

and in order to decode it(remove the b, because sometimes you don't need it)

use:

data.decode()

edited Feb 14, 2021 at 1:07

Marcello DeSales

22.5k15 gold badges85 silver badges81 bronze badges

answered Nov 18, 2020 at 7:18

Billy

1,2071 gold badge12 silver badges19 bronze badges

1 Comment

Karl Knechtel Over a year ago

This is incorrect. bytes literals are interpreted at compile time by a different mechanism; they are not syntactic sugar for a data.encode() call, a str is not created in the process, and the interpretation of text within the "" is not the same. In particular, e.g. b"\u1000" does not create a bytes object representing Unicode character 0x1000 in any meaningful encoding; it creates a bytes object storing numeric values [92, 117, 49, 48, 48, 48] - corresponding to a backslash, lowercase u, digit 1, and three digit 0s.

Eliahu Aaron · Accepted Answer · 2019-08-28 10:31:56Z

14

Here's an example where the absence of b would throw a TypeError exception in Python 3.x

>>> f=open("new", "wb")
>>> f.write("Hello Python!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface

Adding a b prefix would fix the problem.

edited Aug 28, 2019 at 10:31

Eliahu Aaron

4,6425 gold badges32 silver badges43 bronze badges

answered Jun 23, 2014 at 7:02

user3053230

1991 silver badge3 bronze badges

Comments

Ignacio Vazquez-Abrams · Accepted Answer · 2011-06-07 18:16:23Z

13

It turns it into a bytes literal (or str in 2.x), and is valid for 2.6+.

The r prefix causes backslashes to be "uninterpreted" (not ignored, and the difference does matter).

answered Jun 7, 2011 at 18:16

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

3 Comments

Jesse Webb Over a year ago

This sounds wrong according to the documentation quoted in aix's answer; the b will be ignored in Python version other than 3.

Ignacio Vazquez-Abrams Over a year ago

It will be a str in 2.x either way, so it could be said that it is ignored. The distinction matters when you import unicode_literals from the __future__ module.

Karl Knechtel Over a year ago

"the b will be ignored in Python version other than 3." It will have no effect in 2.x because in 2.x, str names the same type that bytes does.

xjcl · Accepted Answer · 2018-03-07 12:16:05Z

12

In addition to what others have said, note that a single character in unicode can consist of multiple bytes.

The way unicode works is that it took the old ASCII format (7-bit code that looks like 0xxx xxxx) and added multi-bytes sequences where all bytes start with 1 (1xxx xxxx) to represent characters beyond ASCII so that Unicode would be backwards-compatible with ASCII.

>>> len('Öl')  # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8')  # convert str to bytes 
b'\xc3\x96l'
>>> len('Öl'.encode('UTF-8'))  # 3 bytes encode 2 characters !
3

answered Mar 7, 2018 at 12:16

xjcl

15.7k8 gold badges87 silver badges108 bronze badges

2 Comments

Karl Knechtel Over a year ago

This is useful supplementary information, but it does not address the question at all. It should be written as a comment to another answer instead.

ShadowRanger Over a year ago

A single character in Unicode does not consist of bytes in the first place. A Unicode character in a specific encoding (like UTF-8, UTF-16, UTF-32, or oddball ones like UTF-7) can consist of multiple bytes (for some of those, they're always multiple bytes), but Unicode characters are platonic ideals; they have no inherent byte representation.

Haterind · Accepted Answer · 2022-04-26 03:34:42Z

9

b"hello" is not a string (even though it looks like one), but a byte sequence. It is a sequence of 5 numbers, which, if you mapped them to a character table, would look like h e l l o. However the value itself is not a string, Python just has a convenient syntax for defining byte sequences using text characters rather than the numbers itself. This saves you some typing, and also often byte sequences are meant to be interpreted as characters. However, this is not always the case - for example, reading a JPG file will produce a sequence of nonsense letters inside b"..." because JPGs have a non-text structure.

.encode() and .decode() convert between strings and bytes.

answered Apr 26, 2022 at 3:34

Haterind

1,5352 gold badges12 silver badges20 bronze badges

Comments

Karam Qusai · Accepted Answer · 2019-05-14 12:45:01Z

6

You can use JSON to convert it to dictionary

import json
data = b'{"key":"value"}'
print(json.loads(data))

{"key":"value"}

FLASK:

This is an example from flask. Run this on terminal line:

import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})

In flask/routes.py

@app.route('/', methods=['POST'])
def api_script_add():
    print(request.data) # --> b'{"hi":"Hello"}'
    print(json.loads(request.data))
return json.loads(request.data)

{'key':'value'}

answered May 14, 2019 at 12:45

Karam Qusai

79913 silver badges17 bronze badges

3 Comments

Andrea Over a year ago

This works well (I do the same for JSON data), but will fail for other type of data. If you have a generic str data, might be an XML for example, you can assign the variable and decode it. Something like data = request.data and then data = data.decode()

Karl Knechtel Over a year ago

This does not answer the question. The question is about what the b means, not about what can be done with the object. Also, this can only be done with a very small subset of bytes literals, the ones that are formatted to the JSON specification.

Karam Qusai Over a year ago

dear @KarlKnechtel It doesn't answer this question directly that is true, but it is good for SEO for Stackoverflow if someone having this issue but isn't able to form the right question but only mentions like b' Flask/Django then this answer will be more relevant for the search engine to put it in front.

Severin Spörri · Accepted Answer · 2022-09-07 09:47:10Z

1

bytes(somestring.encode()) is the solution that worked for me in python 3.

def compare_types():
    output = b'sometext'
    print(output)
    print(type(output))


    somestring = 'sometext'
    encoded_string = somestring.encode()
    output = bytes(encoded_string)
    print(output)
    print(type(output))


compare_types()

answered Sep 7, 2022 at 9:47

Severin Spörri

1301 silver badge12 bronze badges

Comments

hepidad · Accepted Answer · 2023-07-26 11:25:28Z

Answering question 1 and 2: b means you want to change/make use of the ordinary String type into Byte type. For an example:

>>> type(b'')
<class 'bytes'>
>>> type('')
<class 'str'>

Answering questions 3: It can be used when we want to check the bytestream (a sequence of bytes) from some file/object. I.e we want to check SHA1 message digest of some file:

import hashlib

def hash_file(filename):
   """"This function returns the SHA-1 hash of the file passed into it"""

   # make a hash object
   h = hashlib.sha1()

   # open file for reading in binary mode
   with open(filename,'rb') as file:

       # loop till the end of the file
       chunk = 0
       while chunk != b'':
           # read only 1024 bytes at a time
           chunk = file.read(1024)
           h.update(chunk)

   # return the hex representation of digest
   return h.hexdigest()

message = hash_file("somefile.pdf")
print(message)

Collectives™ on Stack Overflow

What does the 'b' character do in front of a string literal?

12 Answers 12

In Python 2.x

13 Comments

5 Comments

Comments

3 Comments

1 Comment

Comments

3 Comments

2 Comments

Comments

3 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

In Python 2.x

13 Comments

5 Comments

Comments

3 Comments

1 Comment

Comments

3 Comments

2 Comments

Comments

3 Comments

Comments

Comments

Linked

Related