How many bytes does a string have

Question

Is there some function which will tell me how many bytes does a string occupy in memory?

I need to set a size of a socket buffer in order to transfer the whole string at once.

So you don't care about the size of the string in memory but rather how long it is in a specific encoding. The usual way would be to convert the string into a byte array (possibly byte string in Python) in the encoding you need to transfer (UTF-8 for example) and get the length. — Joey
– Joey, Commented Oct 25, 2010 at 9:22
@Joey: I don't get you, why do you think it's the length in an encoding? Even I am facing the same issue, needing to know the size of the (string) contents to be sent over the wire. What I really need is the size, in bytes, what would I do with length of that string? — 0xc0de
– 0xc0de, Commented Dec 6, 2015 at 18:53
@0xc0de: “pure” Unicode cannot be sent over the wire unless encoded to bytes. The most common general encodings are “utf-32”, “utf-16-be”/“utf-16-le” or “utf-8” (a very sensible choice since it won't contain null bytes). — tzot
– tzot, Commented Jul 6, 2016 at 13:52

tzot · Accepted Answer · 2017-06-30 04:48:59Z

102

If it's a Python 2.x str, get its len. If it's a Python 3.x str (or a Python 2.x unicode), first encode to bytes (or a str, respectively) using your preferred encoding ('utf-8' is a good choice) and then get the len of the encoded bytes/str object.

For example, ASCII characters use 1 byte each:

>>> len("hello".encode("utf8"))
5

whereas Chinese ones use 3 bytes each:

>>> len("你好".encode("utf8"))
6

edited Jun 30, 2017 at 4:48

answered Oct 25, 2010 at 9:48

tzot

96.6k30 gold badges151 silver badges210 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tom Over a year ago

Indeed this is the right answer. This sys.getsizeof() doesn't give you what you want. So, if you have a utf-8 encoded string, in stead of saying len(myString), just say len(myString.encode("utf8"))

Taywee Over a year ago

This should be the right answer. It will tell you exactly how many bytes you need for the string, unicode or not. There's a good chance the string will be encoded to bytes for transmission anyway, so I doubt there would even be a performance hit.

eumiro · Accepted Answer · 2010-10-25 09:23:02Z

68

import sys
sys.getsizeof(s)

# getsizeof(object, default) -> int
# Return the size of object in bytes.

But actually you need to know its represented length, so something like len(s) should be enough.

answered Oct 25, 2010 at 9:23

eumiro

214k36 gold badges307 silver badges264 bronze badges

8 Comments

Noufal Ibrahim Over a year ago

+1 for the function. Does this not return all the extra baggage to represent the object? The rest of the fields in the PyObject.

eumiro Over a year ago

@Noufal - exactly. For a simple 'a' string it returns 41.

tzot Over a year ago

my 'a' needs 25 bytes; so either you run 64-bit Python or the font I use has simpler strokes :)

John Machin Over a year ago

Ignoring for the moment that sys.getsizeof() is utterly irrelevant to the OP's problem: a size of 25 or 41 is a nonsense; malloc() and friends usually allocate chunks of memory whose size is a multiple of 2 ** n where n is certainly greater than 1, and some of the chunk is occupied by malloc overhead and sys.getsizeof() doesn't allow for any of this (because it doesn't know any details of the malloc implementation).

Brōtsyorfuzthrāx Over a year ago

len(s) won't be enough with Unicode, since many characters take up more than one byte. See tzot's answer (convert to bytes first when using Unicode).

|

Collectives™ on Stack Overflow

How many bytes does a string have

2 Answers 2

2 Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related