0

With numpy.array_splits, you can split an array into equal size chunks. Is there a way to split it into chunks based on a list?

How do I split this array into 4 chunks, with each chunk determined by the size of the chunk given in chunk_size, and consisting of random values from the array?

import numpy as np
np.random.seed(13)
a = np.arange(20)
chunk_size = [10, 5, 3, 2]
dist = [np.random.choice(a, c) for c in chunk_size]
print(dist)

but I get multiple duplications, as expected:

[array([18, 16, 10, 16,  6,  2, 12,  3,  2, 14]),
 array([ 5, 13, 10,  9, 11]), array([ 2,  0, 19]), array([19, 11])]

For example,

  • 16 is contained twice in the first chunks
  • 10 is contained in the first and second chunk

With np.split, this is the answer I get:

>>> for s in np.split(a, chunk_size):
...     print(s.shape)
...
(10,)
(0,)
(0,)
(0,)
(18,)

With np.random.choice and replace=False, still gives duplicate elements:

import numpy as np
np.random.seed(13)
a = np.arange(20)
chunk_size = [10, 5, 3, 2]
dist = [np.random.choice(a, c, replace=False) for c in chunk_size]
print(dist)

While each chunk now does not contain duplicates, it does not prevent that, for example, 7 is contained in both the first and second chunk:

[array([11, 12,  0,  1,  8,  5,  7, 15, 14, 13]),
 array([16,  7, 13,  9, 19]), array([1, 4, 2]), array([15, 12])]
10
  • Why did you use np.random? Do you want to get contiguous chunks or random elements from the original array? Commented Aug 24, 2020 at 14:26
  • random elements, even np.split gives contingous chunks :-( Commented Aug 24, 2020 at 14:26
  • Use replace=False with it. Commented Aug 24, 2020 at 14:28
  • @Divakar: it is different iterations of the for loop, and thus will be replaced. Commented Aug 24, 2020 at 14:31
  • 1
    Would encourage you to put together those and post your own answer. Commented Aug 24, 2020 at 14:55

2 Answers 2

1

One way to ensure that every element of a is contained in exactly one chunk would be to create a random permutation of a first and then split it with np.split.

In order to get an array of splitting indices for np.split from chunk_size you can use np.cumsum.

Example

>>> import numpy as np
>>> np.random.seed(13)
>>> a = np.arange(20)
>>> b = np.random.permutation(a)
>>> b
array([11, 12,  0,  1,  8,  5,  7, 15, 14, 13,
        3, 17,  9,  4,  2,  6, 19, 10, 16, 18])

>>> chunk_size = [10, 5, 3, 2]
>>> np.cumsum(chunk_size)
array([10, 15, 18, 20])

>>> np.split(b, np.cumsum(chunk_size))
[array([11, 12,  0,  1,  8,  5,  7, 15, 14, 13]),
 array([ 3, 17,  9,  4,  2]), array([ 6, 19, 10]), array([16, 18]),
 array([], dtype=int64)]

You could avoid the trailing empty array by omitting the last value in chunk_size, as it is implied by the size of a and the sum of the previous values:

>>> np.split(b, np.cumsum(chunk_size[:-1]))  # [10, 5, 3] -- 2 is implied
[array([11, 12,  0,  1,  8,  5,  7, 15, 14, 13]),
 array([ 3, 17,  9,  4,  2]), array([ 6, 19, 10]), array([16, 18])]
Sign up to request clarification or add additional context in comments.

Comments

1

Thanks to Divakar

import numpy as np
np.random.seed(13)
dist = np.arange(0, 3286, 1)
chunk_size = [975, 708, 515, 343, 269, 228, 77, 57, 42, 33, 11, 9, 7, 4, 3, 1, 1, 1, 1, 1]
dist = [np.random.choice(dist,_, replace=False) for _ in chunk_size]

5 Comments

@mkrieger1: I do not understand what I am missing then. Could you help me out?
Actually, if I think again about the advice given, I don't think it would work as expected.
Could you elaborate?
I think the idea was to remove the elements that were picked for one chunk from the array so that they can't be duplicated in the following chunks. But for that you would have to use the remaining items as input for the next iteration, not those that were picked. I'm not sure how I can explain it better, because I don't know what exactly is unclear to you.
Maybe an example helps: Suppose the array is [1, 2, 3, 4, 5] initially, and [2, 3] are picked for the first chunk. Then for the next iteration you would have to use the remaining items [1, 4, 5] as input to pick from. But the suggestion would have meant to use [2, 3] as input to pick from again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.