How to split a list-of-strings into sublists-of-strings by a specific string element

Question

I have a word list like below. I want to split the list by .. Is there any better or useful code in Python 3?

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
result = []
tmp = []
for elm in a:
    if elm is not '.':
        tmp.append(elm)
    else:
        result.append(tmp)
        tmp = []
print(result)
# result: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]

Update

Add test cases to handle it correctly.

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
def split_list(list_data, split_word='.'):
    result = []
    sub_data = []
    for elm in list_data:
        if elm is not split_word:
            sub_data.append(elm)
        else:
            if len(sub_data) != 0:
                result.append(sub_data)
            sub_data = []
    if len(sub_data) != 0:
        result.append(sub_data)
    return result

print(split_list(a)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
print(split_list(b)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]
print(split_list(c)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]

It think there s a one-liner solution with no additional libraries that comes close to your speeds using list comprehension and string functions. — Scott Boston
– Scott Boston, Commented Dec 2, 2017 at 5:28
@ScottBoston I thought there was some useful functions :). But I'm happy to see many interesting answers. — jef
– jef, Commented Dec 2, 2017 at 13:41
It appears you already split a string once. Your problem would be much simpler if your first split were by sentences. — jpmc26
– jpmc26, Commented Dec 3, 2017 at 1:43

Jeyekomon · Accepted Answer · 2024-10-04 10:53:01Z

28

Using itertools.groupby:

from itertools import groupby
a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
result = [
    list(g)
    for k,g in groupby(a,lambda x:x=='.')
    if not k
]
print (result)
#[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]

edited Oct 4, 2024 at 10:53

Jeyekomon

3,5763 gold badges33 silver badges43 bronze badges

answered Dec 2, 2017 at 4:04

Transhuman

3,5671 gold badge12 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Bahrom · Accepted Answer · 2017-12-02 20:59:42Z

14

You can do this all with a "one-liner" using list comprehension and string functions join, split, strip, and no additional libraries.

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']



In [5]: [i.strip().split(' ') for i in ' '.join(a).split('.') if len(i) > 0 ]
Out[5]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]

In [8]: [i.strip().split(' ') for i in ' '.join(b).split('.') if len(i) > 0 ]
Out[8]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]

In [9]: In [8]: [i.strip().split(' ') for i in ' '.join(c).split('.') if len(i) > 0 ]
Out[9]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]

@Craig has a simpler update:

[s.split() for s in ' '.join(a).split('.') if s]

edited Dec 2, 2017 at 20:59

Bahrom

4,87235 silver badges44 bronze badges

answered Dec 2, 2017 at 5:16

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

5 Comments

Craig Over a year ago

Slightly simpler: [s.split() for s in ' '.join(a).split('.') if s]

Scott Boston Over a year ago

@Craig Thanks! I hate when I over complicate and overthink things.

jef Over a year ago

Oh, this is nice. But if there is a word which includes white space, join will break the original list. I mean like "New York". I may add such a test case. But this is really simple and nice. Thank you!

Scott Boston Over a year ago

@jef It is challenge to break 'This is New York.' into 'This', 'is', 'New York'.

Eric Duminil Over a year ago

This method is broken if any element has a space or a dot.

Óscar López · Accepted Answer · 2017-12-02 04:24:27Z

8

Here's another way using only standard list operations (with no dependencies on other libraries!). First we find the split points and then we create sublists around them; notice that the first element is treated as a special case:

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
indexes = [-1] + [i for i, x in enumerate(a) if x == '.']

[a[indexes[i]+1:indexes[i+1]] for i in range(len(indexes)-1)]
=> [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]

edited Dec 2, 2017 at 4:24

answered Dec 2, 2017 at 4:11

Óscar López

237k38 gold badges321 silver badges391 bronze badges

Comments

Ajax1234 · Accepted Answer · 2017-12-02 04:05:16Z

4

You can reconstruct the string using ' '.join and use regex:

import re
a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
new_s = [b for b in [re.split('\s', i) for i in re.split('\s*\.\s*', ' '.join(a))] if all(b)]

Output:

[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]

answered Dec 2, 2017 at 4:05

Ajax1234

71.7k9 gold badges67 silver badges110 bronze badges

1 Comment

Eric Duminil Over a year ago

Same comment as for @ScottBoston: This method is broken if any element has a space or a dot.

RoadRunner · Accepted Answer · 2017-12-02 05:01:06Z

I couldn't help myself, just wanted to have fun with this great question:

import itertools

a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']

def split_dots(lst):

    dots = [0] + [i+1 for i, e in enumerate(lst) if e == '.']

    result = [list(itertools.takewhile(lambda x : x != '.', lst[dot:])) for dot in dots]

    return list(filter(lambda x : x, result))

print(split_dots(a)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
print(split_dots(b)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]
print(split_dots(c)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]

MSeifert · Accepted Answer · 2017-12-02 12:14:12Z

This answer requires installing a 3rd party library: iteration_utilities¹. The included split function makes solving this task straightforward:

>>> from iteration_utilities import split
>>> a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
>>> list(filter(None, split(a, '.', eq=True)))
[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]

Instead of using the eq parameter you can also define a custom function where to split:

>>> list(filter(None, split(a, lambda x: x=='.')))
[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]

In case you want to keep the '.'s you could also use the keep_before argument:

>>> list(filter(None, split(a, '.', eq=True, keep_before=True)))
[['this', 'is', 'a', 'cat', '.'], ['hello', '.'], ['she', 'is', 'nice', '.']]

Note that the library just makes it easier - it's easily possible (see the other answers) to accomplish this task without installing an additional library.

The filter can be removed if you don't expect '.' to appear at the beginning or end of your to-be-split list.

¹ I'm the author of that library. It's available via pip or the conda-forge channel with conda.

Collectives™ on Stack Overflow

How to split a list-of-strings into sublists-of-strings by a specific string element

Update

6 Answers 6

Comments

@Craig has a simpler update:

5 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Update

6 Answers 6

Comments

@Craig has a simpler update:

5 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related