Python - split list in to sublists based on another list

Question

I have two lists: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9] and l2 = [0.5, 1.0, 1.5, 2.0]. I want to split l1 in to sublists that are defined as the elements between two indexes of l2. So for example l1 would be equal to [[0,0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]].

Here is my solution:

l3 = []
b=0
for i in l2:
    temp = []
    for p in l1:
        if b <= p < i:
        temp.append(p)
    l3.append(temp)
    b+=0.5

This solution is a huge bottleneck in my code. Is there a faster way to do this?

@PeterWood or a hash map! Or an interval tree! So many possibilities! — Maciej Gol
– Maciej Gol, Commented Sep 16, 2015 at 21:05

Martijn Pieters · Accepted Answer · 2015-09-16 20:52:09Z

6

Your lists are sorted, so there is no need to do a double loop here.

The following generates the sublists based on the two lists as inputs:

def partition(values, indices):
    idx = 0
    for index in indices:
        sublist = []
        while idx < len(values) and values[idx] < index:
            sublist.append(values[idx])
            idx += 1
        if sublist:
            yield sublist

You can then iterate over partition(l1, l2) to get individual sublists, or call list() to produce the whole list-of-lists in one go:

>>> l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9] 
>>> l2 = [0.5, 1.0, 1.5, 2.0]
>>> list(partition(l1, l2))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

answered Sep 16, 2015 at 20:52

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

taesu Over a year ago

wouldn't this be O(n*m) like op's solution? would there be huge performance gain?

Martijn Pieters Over a year ago

@taesu: the OPs is O(N*M).

taesu Over a year ago

ahhh, you're right. thanks for that, and also thanks for your participation at SO.

Maciej Gol Over a year ago

@taesu, index goes through whole list, giving O(n). Now, while iterating over indices, idx is not decreasing and each iteration of the while loop increments idx by 1. This gives a maximum of len (values) iterations, bumping complexity to O (n + m). In short, both lists are traversed once.

Kasravnd · Accepted Answer · 2015-09-16 21:29:32Z

3

As a fast way you can use numpy pretty most efficient way for huge lists :

>>> np.split(l1,np.searchsorted(l1,l2))
[array([ 0.   ,  0.002,  0.3  ]), array([ 0.5,  0.6,  0.9]), array([ 1.3]), array([ 1.9]), array([], dtype=float64)]

np.searchsorted will find the indices of l2 items within l1 while l1 remains sorted (with its default sort) and np.split will split your list based on a list of indices.

A benchmark with accepted answer on a list 1000 time bigger :

from timeit import timeit

s1="""

def partition(values, indices):
    idx = 0
    for index in indices:
        sublist = []
        while idx < len(values) and values[idx] < index:
            sublist.append(values[idx])
            idx += 1
        if sublist:
            yield sublist

l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
list(partition(l1, l2))

"""

s2="""
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
np.split(l1,np.searchsorted(l1,l2))
   """

print '1st: ' ,timeit(stmt=s1, number=10000)
print '2nd : ',timeit(stmt=s2, number=10000,setup="import numpy as np")

Result :

1st:  17.5872459412
2nd :  10.3306460381

edited Sep 16, 2015 at 21:29

answered Sep 16, 2015 at 21:09

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

9 Comments

Martijn Pieters Over a year ago

You should really move the creating of l1 and l2 out of the timeit tests (as well as defining the partition() function).

Martijn Pieters Over a year ago

I get 1.43 vs 1.1 for 1000 repetitions. Not bad for a pure-python implementation.

Martijn Pieters Over a year ago

Ah, there is a problem. You cannot just multiply l1 by 1000, because my solution requires l1 to be sorted. Not that it helps my case if you do sort the list properly as that makes my solution slightly slower as more results are produced.

Padraic Cunningham Over a year ago

Answered my own question, my solution using pure python is faster

Kasravnd Over a year ago

@PadraicCunningham 10000000000000000 time larger! :-D

|

Padraic Cunningham · Accepted Answer · 2015-09-16 22:09:50Z

def split_l(a,b):
    it = iter(b)
    start, sub = next(it), []
    for ele in a:
        if ele >= start:
            yield sub
            sub, start = [], next(it)
        sub.append(ele)
    yield sub

print(list(split_l(l1,l2)))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

using kasras input this beats both the accepted answer and the numpy solution:

In [14]: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000

In [15]: l1.sort()

In [16]: l2 = [0.5, 1.0, 1.5, 2.0]

In [17]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.53 ms per loop

In [18]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 703 µs per loop

In [19]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 802 µs per loop

In [20]: list(split_l(l1,l2))  == list(partition(l1,l2))
Out[20]: True

Creating a local reference to append knocks even more off:

def split_l(a, b):
    it = iter(b)
    start, sub = next(it), []
    append = sub.append
    for ele in a:
        if start <= ele:
            yield sub
            start, sub = next(it), []
            append = sub.append
        append(ele)
    yield sub

Runs in just over the time of the numpy solution:

In [47]: l1.sort()

In [48]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 498 µs per loop

In [49]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.73 ms per loop

In [50]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 812 µs per loop

LetzerWille · Accepted Answer · 2015-09-16 21:54:30Z

0

l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]

l2 = [0.5, 1.0, 1.5, 2.0]


  def partition(values, indices):

    temp = []
    p_list = []


    for j in range(len(indices)):
        for i in range(len(values)):
            if indices[j] > values[i]:
                temp.append(values[i])

        p_list.append(temp)

        # added to the partition values are truncated from the list
        values = values[len(temp):]

        temp = []

    print(p_list)

partition(l1, l2)

[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

edited Sep 16, 2015 at 21:54

answered Sep 16, 2015 at 21:32

LetzerWille

5,6965 gold badges26 silver badges28 bronze badges

1 Comment

Martijn Pieters Over a year ago

This is just as bad as the OP's solution; you have a O(N*M) quadratic algorithm here.

Collectives™ on Stack Overflow

Python - split list in to sublists based on another list

4 Answers 4

4 Comments

9 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

9 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related