12

how to split a string at positions before a character?

  • split a string before 'a'
  • input: "fffagggahhh"
  • output: ["fff", "aggg", "ahhh"]

the obvious way doesn't work:

>>> h=re.compile("(?=a)")

>>> h.split("fffagggahhh")

['fffagggahhh']

>>>
7
  • 2
    What do you expect when you split "aaa"['', 'a', 'a', 'a'] or ['a', 'a', 'a']? Commented Nov 4, 2010 at 7:22
  • "aaa" -> "a", "a", "a" or "", "a", "a", "a" Commented Nov 4, 2010 at 7:55
  • 1
    -1: "aaa" -> ["a", "a", "a"] or ["", "a", "a", "a"]. That's the least helpful thing I've ever seen. Both are right? In that case, no pattern can ever work. Close this question. Commented Nov 4, 2010 at 10:30
  • either one of them will do. if you have coded in python before, you would know a simple filter(bool, L) will filter out the empty element. Commented Nov 5, 2010 at 1:57
  • 1
    Did something change in Python? Now your "the obvious way" works flawlessly. Commented Mar 13, 2021 at 1:11

7 Answers 7

21

Ok, not exactly the solution you want but I thought it will be a useful addition to problem here.

Solution without re

Without re:

>>> x = "fffagggahhh"
>>> k = x.split('a')
>>> j = [k[0]] + ['a'+l for l in k[1:]]
>>> j
['fff', 'aggg', 'ahhh']
>>> 
Sign up to request clarification or add additional context in comments.

2 Comments

@knitti: Thanks. I understand it is not the re based solution and I wanted to write it first before I write re solution. By the time, I finished writing this, the re based solution had come.
yeah, why use a hammer on a single nail if you've got a nail shooter.
5
>>> rx = re.compile("(?:a|^)[^a]*")
>>> rx.findall("fffagggahhh")
['fff', 'aggg', 'ahhh']
>>> rx.findall("aaa")
['a', 'a', 'a']
>>> rx.findall("fgh")
['fgh']
>>> rx.findall("")
['']

1 Comment

-1 re.findall("(?:^|a)[^a]*", "aaa") produces ['', 'a', 'a']
4
>>> r=re.compile("(a?[^a]+)")
>>> r.findall("fffagggahhh")
['fff', 'aggg', 'ahhh']

EDIT:

This won't handle correctly double as in the string:

>>> r.findall("fffagggaahhh")
['fff', 'aggg', 'ahhh']

KennyTM's re seems better suited.

2 Comments

I wonder if the OP would want to keep the empty string from the split if it started with an 'a'.
-1 Uncool. Fails on repeated a's ... e.g. "aaa" -> empty list
3
import re

def split_before(pattern,text):
    prev = 0
    for m in re.finditer(pattern,text):
        yield text[prev:m.start()]
        prev = m.start()
    yield text[prev:]


if __name__ == '__main__':
    print list(split_before("a","fffagggahhh"))

re.split treats the pattern as a delimiter.

>>> print list(split_before("a","afffagggahhhaab"))
['', 'afff', 'aggg', 'ahhh', 'a', 'ab']
>>> print list(split_before("a","ffaabcaaa"))
['ff', 'a', 'abc', 'a', 'a', 'a']
>>> print list(split_before("a","aaaaa"))
['', 'a', 'a', 'a', 'a', 'a']
>>> print list(split_before("a","bbbb"))
['bbbb']
>>> print list(split_before("a",""))
['']

Comments

1

This one works on repeated a's

  >>> re.findall("a[^a]*|^[^a]*", "aaaaa")
  ['a', 'a', 'a', 'a', 'a']
  >>> re.findall("a[^a]*|[^a]+", "ffaabcaaa")
  ['ff', 'a', 'abc', 'a', 'a', 'a']

Approach: the main chunks that you are looking for are an a followed by zero or more not-a. That covers all possibilities except for zero or more not-a. That can happen only at the start of the input string.

Comments

-1
>>> foo = "abbcaaaabbbbcaaab"
>>> bar = foo.split("c")
>>> baz = [bar[0]] + ["c"+x for x in bar[1:]]
>>> baz
['abb', 'caaaabbbb', 'caaab']

Due to how slicing works, this will work properly even if there are no occurrences of c in foo.

Comments

-3

split() takes an argument for the character to split on:

>>> "fffagggahhh".split('a')
['fff', 'ggg', 'hhh']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.