68

Is there some function which will tell me how many bytes does a string occupy in memory?

I need to set a size of a socket buffer in order to transfer the whole string at once.

3
  • 7
    So you don't care about the size of the string in memory but rather how long it is in a specific encoding. The usual way would be to convert the string into a byte array (possibly byte string in Python) in the encoding you need to transfer (UTF-8 for example) and get the length. Commented Oct 25, 2010 at 9:22
  • @Joey: I don't get you, why do you think it's the length in an encoding? Even I am facing the same issue, needing to know the size of the (string) contents to be sent over the wire. What I really need is the size, in bytes, what would I do with length of that string? Commented Dec 6, 2015 at 18:53
  • @0xc0de: “pure” Unicode cannot be sent over the wire unless encoded to bytes. The most common general encodings are “utf-32”, “utf-16-be”/“utf-16-le” or “utf-8” (a very sensible choice since it won't contain null bytes). Commented Jul 6, 2016 at 13:52

2 Answers 2

102

If it's a Python 2.x str, get its len. If it's a Python 3.x str (or a Python 2.x unicode), first encode to bytes (or a str, respectively) using your preferred encoding ('utf-8' is a good choice) and then get the len of the encoded bytes/str object.


For example, ASCII characters use 1 byte each:

>>> len("hello".encode("utf8"))
5

whereas Chinese ones use 3 bytes each:

>>> len("你好".encode("utf8"))
6
Sign up to request clarification or add additional context in comments.

2 Comments

Indeed this is the right answer. This sys.getsizeof() doesn't give you what you want. So, if you have a utf-8 encoded string, in stead of saying len(myString), just say len(myString.encode("utf8"))
This should be the right answer. It will tell you exactly how many bytes you need for the string, unicode or not. There's a good chance the string will be encoded to bytes for transmission anyway, so I doubt there would even be a performance hit.
68
import sys
sys.getsizeof(s)

# getsizeof(object, default) -> int
# Return the size of object in bytes.

But actually you need to know its represented length, so something like len(s) should be enough.

8 Comments

+1 for the function. Does this not return all the extra baggage to represent the object? The rest of the fields in the PyObject.
@Noufal - exactly. For a simple 'a' string it returns 41.
my 'a' needs 25 bytes; so either you run 64-bit Python or the font I use has simpler strokes :)
Ignoring for the moment that sys.getsizeof() is utterly irrelevant to the OP's problem: a size of 25 or 41 is a nonsense; malloc() and friends usually allocate chunks of memory whose size is a multiple of 2 ** n where n is certainly greater than 1, and some of the chunk is occupied by malloc overhead and sys.getsizeof() doesn't allow for any of this (because it doesn't know any details of the malloc implementation).
len(s) won't be enough with Unicode, since many characters take up more than one byte. See tzot's answer (convert to bytes first when using Unicode).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.