0

Given an encoding, is there a preconceived way to 'cap' a string according to some given maximum size in bytes. Illustration:

>>> some_string = 'abc'
>>> size_limit = 2
>>> encoding = 'utf-8'
>>> capped_string = cap_to_size(some_string, size_limit, encoding)
>>> capped_string
'ab'

That is, the function cap_to_size (so to speak) cuts away the rightmost characters in the string until the resulting string has the given size. If the given string is smaller than the size limit already, nothing happens and the original string is returned.

In the case of multibyte characters, they should be discarded in their entirety, if one of their bytes exceeds the size limit.

2
  • 2
    What should happen if the cap occurs in the middle of a multibyte character? Commented Jun 27, 2015 at 12:27
  • Good question, didn't think of that since I am dealing with an ASCII-compatiable character set at the moment. But generally, multibyte characters should be discarded in their entirety, if one of their bytes exceeds the size limit. I'll update the question. Commented Jun 27, 2015 at 12:33

1 Answer 1

1

Off the top of my head (not well tested yet):

def cap_to_size(some_string, size_limit, encoding):
    result = ""
    for char in some_string:
        size_limit -= len(char.encode(encoding))
        if size_limit >= 0:
            result += char
        else:
            return result
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.