python re - split a string before a character

Question

how to split a string at positions before a character?

split a string before 'a'
input: "fffagggahhh"
output: ["fff", "aggg", "ahhh"]

the obvious way doesn't work:

>>> h=re.compile("(?=a)")

>>> h.split("fffagggahhh")

['fffagggahhh']

>>>

What do you expect when you split "aaa" — ['', 'a', 'a', 'a'] or ['a', 'a', 'a']? — kennytm
– kennytm, Commented Nov 4, 2010 at 7:22
-1: "aaa" -> ["a", "a", "a"] or ["", "a", "a", "a"]. That's the least helpful thing I've ever seen. Both are right? In that case, no pattern can ever work. Close this question. — S.Lott
– S.Lott, Commented Nov 4, 2010 at 10:30
either one of them will do. if you have coded in python before, you would know a simple filter(bool, L) will filter out the empty element. — kakarukeys
– kakarukeys, Commented Nov 5, 2010 at 1:57
Did something change in Python? Now your "the obvious way" works flawlessly. — fireattack
– fireattack, Commented Mar 13, 2021 at 1:11

pyfunc · Accepted Answer · 2010-11-04 06:39:06Z

21

Ok, not exactly the solution you want but I thought it will be a useful addition to problem here.

Solution without re

Without re:

>>> x = "fffagggahhh"
>>> k = x.split('a')
>>> j = [k[0]] + ['a'+l for l in k[1:]]
>>> j
['fff', 'aggg', 'ahhh']
>>>

answered Nov 4, 2010 at 6:39

pyfunc

67k15 gold badges155 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

pyfunc Over a year ago

@knitti: Thanks. I understand it is not the re based solution and I wanted to write it first before I write re solution. By the time, I finished writing this, the re based solution had come.

knitti Over a year ago

yeah, why use a hammer on a single nail if you've got a nail shooter.

kennytm · Accepted Answer · 2010-11-04 07:07:41Z

5

>>> rx = re.compile("(?:a|^)[^a]*")
>>> rx.findall("fffagggahhh")
['fff', 'aggg', 'ahhh']
>>> rx.findall("aaa")
['a', 'a', 'a']
>>> rx.findall("fgh")
['fgh']
>>> rx.findall("")
['']

edited Nov 4, 2010 at 7:07

answered Nov 4, 2010 at 6:41

kennytm

526k111 gold badges1.1k silver badges1k bronze badges

1 Comment

John Machin Over a year ago

-1 re.findall("(?:^|a)[^a]*", "aaa") produces ['', 'a', 'a']

adamk · Accepted Answer · 2010-11-04 06:45:19Z

4

>>> r=re.compile("(a?[^a]+)")
>>> r.findall("fffagggahhh")
['fff', 'aggg', 'ahhh']

EDIT:

This won't handle correctly double as in the string:

>>> r.findall("fffagggaahhh")
['fff', 'aggg', 'ahhh']

KennyTM's re seems better suited.

edited Nov 4, 2010 at 6:45

answered Nov 4, 2010 at 6:38

adamk

47.1k7 gold badges52 silver badges57 bronze badges

2 Comments

Jeff Mercado Over a year ago

I wonder if the OP would want to keep the empty string from the split if it started with an 'a'.

John Machin Over a year ago

-1 Uncool. Fails on repeated a's ... e.g. "aaa" -> empty list

Terrel Shumway · Accepted Answer · 2010-11-04 07:38:10Z

3

import re

def split_before(pattern,text):
    prev = 0
    for m in re.finditer(pattern,text):
        yield text[prev:m.start()]
        prev = m.start()
    yield text[prev:]


if __name__ == '__main__':
    print list(split_before("a","fffagggahhh"))

re.split treats the pattern as a delimiter.

>>> print list(split_before("a","afffagggahhhaab"))
['', 'afff', 'aggg', 'ahhh', 'a', 'ab']
>>> print list(split_before("a","ffaabcaaa"))
['ff', 'a', 'abc', 'a', 'a', 'a']
>>> print list(split_before("a","aaaaa"))
['', 'a', 'a', 'a', 'a', 'a']
>>> print list(split_before("a","bbbb"))
['bbbb']
>>> print list(split_before("a",""))
['']

edited Nov 4, 2010 at 7:38

answered Nov 4, 2010 at 6:50

Terrel Shumway

4643 silver badges7 bronze badges

Comments

John Machin · Accepted Answer · 2010-11-04 07:28:27Z

1

This one works on repeated a's

  >>> re.findall("a[^a]*|^[^a]*", "aaaaa")
  ['a', 'a', 'a', 'a', 'a']
  >>> re.findall("a[^a]*|[^a]+", "ffaabcaaa")
  ['ff', 'a', 'abc', 'a', 'a', 'a']

Approach: the main chunks that you are looking for are an a followed by zero or more not-a. That covers all possibilities except for zero or more not-a. That can happen only at the start of the input string.

edited Nov 4, 2010 at 7:28

answered Nov 4, 2010 at 7:02

John Machin

83.2k12 gold badges147 silver badges193 bronze badges

Comments

Amber · Accepted Answer · 2010-11-04 06:41:41Z

-1

>>> foo = "abbcaaaabbbbcaaab"
>>> bar = foo.split("c")
>>> baz = [bar[0]] + ["c"+x for x in bar[1:]]
>>> baz
['abb', 'caaaabbbb', 'caaab']

Due to how slicing works, this will work properly even if there are no occurrences of c in foo.

answered Nov 4, 2010 at 6:41

Amber

531k89 gold badges643 silver badges558 bronze badges

Comments

Igor Serebryany · Accepted Answer · 2010-11-04 06:40:15Z

-3

split() takes an argument for the character to split on:

>>> "fffagggahhh".split('a')
['fff', 'ggg', 'hhh']

answered Nov 4, 2010 at 6:40

Igor Serebryany

3,3413 gold badges32 silver badges43 bronze badges

Collectives™ on Stack Overflow

python re - split a string before a character

7 Answers 7

2 Comments

1 Comment

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

2 Comments

1 Comment

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related