1

I'd like to split a string by increased number with python.

For example, I have a following string.

"1. aaa aaa aa. 2. bb bbbb bb. 3. cc cccc cc 4. ddd d dddd ... 99. z zzzz zzz"

And I want to get a following list from the above string.

[aaa aaa aa, bb bbbb bb, cc cccc cc, ddd d dddd, ... z zzzz zzz]

I tried it with following code, but I couldn't get what I wanted.

InputString = "1. aaa aaa aa. 2. bb bbbb bb. 3. cc cccc cc 4. ddd d dddd ... 99. z zzzz zzz"
OutputList = InputString.split("[1-99]. ")
1
  • You should think a bit about border cases, your requirement is a bit incomplete. Are solutions where the numbers are not consecutive acceptable? Then a regex might be fine. If not, you'll need to iterate yourself starting with the whole string, splitting off the part to the left of the next number, and continuing with the part to the right until you don't find a match anymore. If your example is accurate, you will also have to handle the trailing period after each part. Commented Jul 25, 2019 at 5:11

2 Answers 2

4

You can use the re module to split your string by a regular expression

re.split(r'[0-9]+\.', input)

[0-9]+ matches 1 to many digits and \. matches the literal . character

EDIT:

You can prefix the regex with (\.\s)? to conditionally find leading periods at the end of each list of characters

re.split(r'(\.\s)?[0-9]+\.', input)
Sign up to request clarification or add additional context in comments.

3 Comments

This is a good answer, although the example in the question seems to suggest the OP also wants the periods at the end of the text filtered out, but that seems to be besides the main point.
@Grismar I was going to point that out too but it's probably a typo in the question.
You're right, I must have missed the trailing .s!
0

This expression might also work:

Test

import re

regex = r"(?<=[0-9]\.)\s*(.*?)(?=[0-9]{1,}\.|$)"
test_str = "1. aaa aaa aa. 2. bb bbbb bb. 3. cc cccc cc 4. ddd d dddd ... 99. z zzzz zzz"

print(re.findall(regex, test_str))

Output

['aaa aaa aa. ', 'bb bbbb bb. ', 'cc cccc cc ', 'ddd d dddd ... ', 'z zzzz zzz']

The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.