6

I have two lists: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9] and l2 = [0.5, 1.0, 1.5, 2.0]. I want to split l1 in to sublists that are defined as the elements between two indexes of l2. So for example l1 would be equal to [[0,0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]].

Here is my solution:

l3 = []
b=0
for i in l2:
    temp = []
    for p in l1:
        if b <= p < i:
        temp.append(p)
    l3.append(temp)
    b+=0.5

This solution is a huge bottleneck in my code. Is there a faster way to do this?

2
  • So these are buckets. It's a histogram! Commented Sep 16, 2015 at 20:56
  • 1
    @PeterWood or a hash map! Or an interval tree! So many possibilities! Commented Sep 16, 2015 at 21:05

4 Answers 4

6

Your lists are sorted, so there is no need to do a double loop here.

The following generates the sublists based on the two lists as inputs:

def partition(values, indices):
    idx = 0
    for index in indices:
        sublist = []
        while idx < len(values) and values[idx] < index:
            sublist.append(values[idx])
            idx += 1
        if sublist:
            yield sublist

You can then iterate over partition(l1, l2) to get individual sublists, or call list() to produce the whole list-of-lists in one go:

>>> l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9] 
>>> l2 = [0.5, 1.0, 1.5, 2.0]
>>> list(partition(l1, l2))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]
Sign up to request clarification or add additional context in comments.

4 Comments

wouldn't this be O(n*m) like op's solution? would there be huge performance gain?
@taesu: the OPs is O(N*M).
ahhh, you're right. thanks for that, and also thanks for your participation at SO.
@taesu, index goes through whole list, giving O(n). Now, while iterating over indices, idx is not decreasing and each iteration of the while loop increments idx by 1. This gives a maximum of len (values) iterations, bumping complexity to O (n + m). In short, both lists are traversed once.
3

As a fast way you can use numpy pretty most efficient way for huge lists :

>>> np.split(l1,np.searchsorted(l1,l2))
[array([ 0.   ,  0.002,  0.3  ]), array([ 0.5,  0.6,  0.9]), array([ 1.3]), array([ 1.9]), array([], dtype=float64)]

np.searchsorted will find the indices of l2 items within l1 while l1 remains sorted (with its default sort) and np.split will split your list based on a list of indices.

A benchmark with accepted answer on a list 1000 time bigger :

from timeit import timeit

s1="""

def partition(values, indices):
    idx = 0
    for index in indices:
        sublist = []
        while idx < len(values) and values[idx] < index:
            sublist.append(values[idx])
            idx += 1
        if sublist:
            yield sublist

l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
list(partition(l1, l2))

"""

s2="""
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000
l2 = [0.5, 1.0, 1.5, 2.0]
np.split(l1,np.searchsorted(l1,l2))
   """

print '1st: ' ,timeit(stmt=s1, number=10000)
print '2nd : ',timeit(stmt=s2, number=10000,setup="import numpy as np")

Result :

1st:  17.5872459412
2nd :  10.3306460381

9 Comments

You should really move the creating of l1 and l2 out of the timeit tests (as well as defining the partition() function).
I get 1.43 vs 1.1 for 1000 repetitions. Not bad for a pure-python implementation.
Ah, there is a problem. You cannot just multiply l1 by 1000, because my solution requires l1 to be sorted. Not that it helps my case if you do sort the list properly as that makes my solution slightly slower as more results are produced.
Answered my own question, my solution using pure python is faster
@PadraicCunningham 10000000000000000 time larger! :-D
|
1
def split_l(a,b):
    it = iter(b)
    start, sub = next(it), []
    for ele in a:
        if ele >= start:
            yield sub
            sub, start = [], next(it)
        sub.append(ele)
    yield sub

print(list(split_l(l1,l2)))
[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

using kasras input this beats both the accepted answer and the numpy solution:

In [14]: l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]*1000

In [15]: l1.sort()

In [16]: l2 = [0.5, 1.0, 1.5, 2.0]

In [17]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.53 ms per loop

In [18]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 703 µs per loop

In [19]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 802 µs per loop

In [20]: list(split_l(l1,l2))  == list(partition(l1,l2))
Out[20]: True

Creating a local reference to append knocks even more off:

def split_l(a, b):
    it = iter(b)
    start, sub = next(it), []
    append = sub.append
    for ele in a:
        if start <= ele:
            yield sub
            start, sub = next(it), []
            append = sub.append
        append(ele)
    yield sub

Runs in just over the time of the numpy solution:

In [47]: l1.sort()

In [48]: timeit list(split_l(l1,l2))
1000 loops, best of 3: 498 µs per loop

In [49]: timeit list(partition(l1,l2))
1000 loops, best of 3: 1.73 ms per loop

In [50]: timeit np.split(l1,np.searchsorted(l1,l2))
1000 loops, best of 3: 812 µs per loop

Comments

0
l1 = [0, 0.002, 0.3, 0.5, 0.6, 0.9, 1.3, 1.9]

l2 = [0.5, 1.0, 1.5, 2.0]


  def partition(values, indices):

    temp = []
    p_list = []


    for j in range(len(indices)):
        for i in range(len(values)):
            if indices[j] > values[i]:
                temp.append(values[i])

        p_list.append(temp)

        # added to the partition values are truncated from the list
        values = values[len(temp):]

        temp = []

    print(p_list)

partition(l1, l2)

[[0, 0.002, 0.3], [0.5, 0.6, 0.9], [1.3], [1.9]]

1 Comment

This is just as bad as the OP's solution; you have a O(N*M) quadratic algorithm here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.