Split a string into pieces by length in python

Question

Python
I want to split a string into parts which have at most 5000 characters. (We also need to be aware not to split it when we are in a word, and split it only if we found a space.)
I iterated through the string character by character, and every 4980 characters I split it into parts, and then if there remains a part which is less than 4980 I translate that too. I am new to python, so I'm sure my method is a mess, which works, but certainly isn't good code.
I haven't checked for any spaces in the string because in Japanese and Chinese there aren't spaces, but this would need to be checked too so we don't split a word into two parts.

with open('lightnovel.txt', 'r', encoding="utf8") as f:
file = f.read()

db = 0
partofbook = u''
last = u''
length = len(file)
mult = 0
for character in file:
    db = db + 1
    partofbook = partofbook + character
    if db > 4880:
        mult += 1
        db = 0
        trans(partofbook)
        partofbook = u''
    elif length - (mult * 4980) > 0 and length - (mult * 4980) < 5000 :
        last = last + character
        do = 1
if do == 1:
    trans(last)

Why don't you start at index 5000, iterate backwards till you find whitespace at position A, let's say, then your first output is string[0,A-1]. Then jump ahead to index A+5000 and do the same thing, searching backwards for whitespace, found at index B, so your next output is string[A, B-1]. Repeat until done. Obviously check that you don't skip beyond len(string). — jarmod
– jarmod, Commented Mar 1, 2021 at 18:52
Can you post this comment as an answer so I can check it as a solution? — Dani Suba
– Dani Suba, Commented Mar 1, 2021 at 19:05
Yes, see How to get char from string by index? and [ ](stackoverflow.com/questions/663171/…) — jarmod
– jarmod, Commented Mar 1, 2021 at 19:06

MSS98 · Accepted Answer · 2021-03-01 18:54:43Z

1

I'm also new to python so I apologise for not implementing this into your code.

there is a function called string.split() (where string is the sentence you want to split).

this function would split only when there is a space.

answered Mar 1, 2021 at 18:54

MSS98

151 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dani Suba Over a year ago

The problem is that this doesn't split it by length, but occurences like a w3schools example says: apple#banana#orange would give apple, banana, and orange in a list if we choose to split by "#". I haven't found a way to use this function with length parameters.

jarmod · Accepted Answer · 2021-03-01 19:13:57Z

0

I would start at index 5000, iterate backwards till you find whitespace at position A, let's say, then your first output is string[0,A-1] (in Python, you can use s[0:A] to get this substring).

Then jump ahead to index A+5000 and do the same thing, searching backwards for whitespace, found at index B, so your next output is string[A, B-1] (in Python you can use s[A+1:B] to get this substring). Note: it's A+1 because you want to skip the whitespace found at index A.

Repeat until done. Obviously check that you don't skip beyond len(string).

Also, see

edited Mar 1, 2021 at 19:13

answered Mar 1, 2021 at 19:07

jarmod

79.8k18 gold badges132 silver badges137 bronze badges

Collectives™ on Stack Overflow

Split a string into pieces by length in python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related