1

I'm making an array of sums of random choices from a negative binomial distribution (nbd), with each sum being of non-regular length. Right now I implement it as follows:

import numpy
from numpy.random import default_rng
rng = default_rng()

nbd = rng.negative_binomial(1, 0.5, int(1e6))
gmc = [12, 35, 4, 67, 2]
n_pp = np.empty(len(gmc))
for i in range(len(gmc)):
    n_pp[i] = np.sum(rng.choice(nbd, gmc[i]))

This works, but when I perform it over my actual data it's very slow (gmc is of dimension 1e6), and I would like to vary this for multiple values of n and p in the nbd (in this example they're set to 1 and 0.5, respectively).

I'd like to work out a pythonic way to do this which eliminates the loop, but I'm not sure it's possible. I want to keep default_rng for the better random generation than the older way of doing it (np.random.choice), if possible.

2
  • 2
    Is there a reason you're sampling from a large, previously-generated sample each iteration instead of making a new negative_binomial call each time? Commented Nov 19, 2021 at 1:00
  • Yeah, as the answer below shows, I should be approaching this differently. I thought making a large sample base would speed things up as it's a one time generation, but this (obviously) doesn't leverage broadcasting. Commented Nov 19, 2021 at 4:00

1 Answer 1

1

The distribution of the sum of m samples from the negative binomial distribution with parameters (n, p) is the negative binomial distribution with parameters (m*n, p). So instead of summing random selections from a large, precomputed sample of negative_binomial(1, 0.5), you can generate your result directly with negative_binomial(gmc, 0.5):

In [68]: gmc = [12, 35, 4, 67, 2]

In [69]: npp = rng.negative_binomial(gmc, 0.5)

In [70]: npp
Out[70]: array([ 9, 34,  1, 72,  7])

(The negative_binomial method will broadcast its inputs, so we can pass gmc as an argument to generate all the samples with one call.)

More generally, if you want to vary the n that is used to generate nbd, you would multiply that n by the corresponding element in gmc and pass the product to rng.negative_binomial.

Sign up to request clarification or add additional context in comments.

1 Comment

I did it my unnecessary loop way and this way and compared histograms, and this works a treat. I'll have to look at the math underlying it a bit more to be sure I get it, but this appears to do the trick. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.