Is there a pythonic way to sample N consecutive elements from a list or numpy array

Question

Is there a pythonic way to select N consecutive elements from a list or numpy array.

So Suppose:

Choice = [1,2,3,4,5,6]

I would like to create a new list of length N by randomly selecting element X in Choice along with the N-1 consecutive elements following choice.

So if:

X = 4 
N = 4

The resulting list would be:

Selection = [5,6,1,2]

I think something similar to the following would work.

S = [] 
for i in range(X,X+N):
    S.append(Selection[i%6])

But I was wondering if there is a python or numpy function that can select the elements at once that was more efficient.

why not just randomly choose a starting index from [0, len(choices) - N)? — alkasm
– alkasm, Commented Jan 27, 2021 at 2:23
Check out the slice notation: my_list[1:3]. You will need to figure out the logic if you consider the list circular. — naicolas
– naicolas, Commented Jan 27, 2021 at 2:23
Does this answer your question? Cycle through list starting at a certain element — busybear
– busybear, Commented Jan 27, 2021 at 2:26
@phntm yes, if N <= len(Choice) the answer can be much simpler. See the edit to my answer — Nick
– Nick, Commented Jan 27, 2021 at 8:23

chepner · Accepted Answer · 2021-01-27 02:27:43Z

10

Use itertools, specifically islice and cycle.

start = random.randint(0, len(Choice) - 1)
list(islice(cycle(Choice), start, start + n))

cycle(Choice) is an infinite sequence that repeats your original list, so that the slice start:start + n will wrap if necessary.

edited Jan 27, 2021 at 2:27

answered Jan 27, 2021 at 2:25

chepner

538k77 gold badges594 silver badges746 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

phntm Over a year ago

This is awesome thanks! Is using itertools generally faster than numpy?

ShadowRanger Over a year ago

@phntm: For large inputs, it's almost guaranteed to be slower than any other solution, as it's O(n) on both memory usage and processing time (islice can't skip values, cycle stores them all until the input is exhausted). For small inputs, it hardly matters what solution you use.

Nick · Accepted Answer · 2021-01-27 08:22:45Z

4

You could use a list comprehension, using modulo operations on the index to keep it in range of the list:

Choice = [1,2,3,4,5,6] 
X = 4 
N = 4
L = len(Choice)
Selection = [Choice[i % L] for i in range(X, X+N)]
print(Selection)

Output

[5, 6, 1, 2]

Note that if N is less than or equal to len(Choice), you can greatly simplify the code:

Choice = [1,2,3,4,5,6] 
X = 4 
N = 4
L = len(Choice)
Selection = Choice[X:X+N] if X+N <= L else Choice[X:] + Choice[:X+N-L]
print(Selection)

edited Jan 27, 2021 at 8:22

answered Jan 27, 2021 at 2:25

Nick

147k23 gold badges67 silver badges106 bronze badges

Comments

Marc · Accepted Answer · 2021-01-27 03:42:20Z

3

Since you are asking for the most efficient way I created a little benchmark to test the solutions proposed in this thread.

I rewrote your current solution as:

def op(choice, x):
    n = len(choice)
    selection = []
    for i in range(x, x + n):
        selection.append(choice[i % n])
    return selection

Where choice is the input list and x is the random index.

These are the results if choice contains 1_000_000 random numbers:

chepner: 0.10840400000000017 s
nick: 0.2066781999999998 s
op: 0.25887470000000024 s
fountainhead: 0.3679908000000003 s

Full code

import random
from itertools import cycle, islice
from time import perf_counter as pc
import numpy as np


def op(choice, x):
    n = len(choice)
    selection = []
    for i in range(x, x + n):
        selection.append(choice[i % n])
    return selection


def nick(choice, x):
    n = len(choice)
    return [choice[i % n] for i in range(x, x + n)]


def fountainhead(choice, x):
    n = len(choice)
    return np.take(choice, range(x, x + n), mode='wrap')


def chepner(choice, x):
    n = len(choice)
    return list(islice(cycle(choice), x, x + n))


results = []
n = 1_000_000
choice = random.sample(range(n), n)
x = random.randint(0, n - 1)

# Correctness
assert op(choice, x) == nick(choice,x) == chepner(choice,x) == list(fountainhead(choice,x))

# Benchmark
for f in op, nick, chepner, fountainhead:
    t0 = pc()
    f(choice, x)
    t1 = pc()
    results.append((t1 - t0, f))

for t, f in sorted(results):
    print(f'{f.__name__}: {t} s')

edited Jan 27, 2021 at 3:42

answered Jan 27, 2021 at 3:27

Marc

3,3952 gold badges19 silver badges24 bronze badges

3 Comments

Nick Over a year ago

Thanks for taking the time - it's always interesting to see performance results.

Nick Over a year ago

Interestingly on my computer, for 1,000,000 entries the advantage of chepner is only about 15-20%, and it decreases further as the list length gets longer. At shorter lengths (10k or less), chepner is more than 2x faster.

Marc Over a year ago

@Nick yes chepner solution seems the fastest for a list. For larger list size (>10^6) using an array as input (like array('i', choice)) is faster and takes less memory. If choice is a Numpy array then this solution is the fastest regardless of the input size.

fountainhead · Accepted Answer · 2021-01-27 05:15:09Z

3

If using a numpy array as the source, we could of course use numpy "fancy indexing".

So, if ChoiceArray is the numpy array equivalent of the list Choice, and if L is len(Choice) or len(ChoiceArray):

Selection = ChoiceArray [np.arange(X, N+X) % L]

edited Jan 27, 2021 at 5:15

answered Jan 27, 2021 at 4:35

fountainhead

3,7421 gold badge11 silver badges18 bronze badges

Comments

fountainhead · Accepted Answer · 2021-01-27 02:45:53Z

2

Here's a numpy approach:

import numpy as np

Selection = np.take(Choice, range(X,N+X), mode='wrap')

Works even if Choice is a Python list rather than a numpy array.

edited Jan 27, 2021 at 2:45

answered Jan 27, 2021 at 2:40

fountainhead

3,7421 gold badge11 silver badges18 bronze badges

2 Comments

phntm Over a year ago

Thanks this is really helpful! Is there a difference in terms of using np.random.default_rng vs np.random.choice?

fountainhead Over a year ago

@phntm - Yes there are differences. I think this SO thread discusses that

Collectives™ on Stack Overflow

Is there a pythonic way to sample N consecutive elements from a list or numpy array

5 Answers 5

2 Comments

Comments

Full code

3 Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

Full code

3 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related