0

I need to estimate the size of a population, by finding the value of n which maximises scipy.misc.comb(n, a)/n**b where a and b are constants. n, a and b are all integers.

Obviously, I could just have a loop in range(SOME_HUGE_NUMBER), calculate the value for each n and break out of the loop once I reach an inflexion in the curve. But I wondered if there was an elegant way of doing this with (say) numpy/scipy, or is there some other elegant way of doing this just in pure Python (e.g. like an integer equivalent of Newton's method?)

3
  • How large do you expect n to be? Of the order of 181 as in the linked answer, or more of the order of 7.5 billion humans on earth? Commented Oct 25, 2016 at 8:51
  • I would (gut-feel) expect n < 1000, and certainly << 10000, although until I run the real data I have absolutely no way of knowing! Commented Oct 25, 2016 at 9:36
  • You can convert comb into a function over the reals via the gamma function (or by approximating with Stirling's formula). Then you can do a numerical solution technique and then just check which nearby integer is max. Commented Oct 25, 2016 at 13:08

2 Answers 2

1

As long as your number n is reasonably small (smaller than approx. 1500), my guess for the fastest way to do this is to actually try all possible values. You can do this quickly by using numpy:

import numpy as np
import scipy.misc as misc

nMax = 1000
a = 77
b = 100
n = np.arange(1, nMax+1, dtype=np.float64)
val = misc.comb(n, a)/n**b
print("Maximized for n={:d}".format(int(n[val.argmax()]+0.5)))
# Maximized for n=181

This is not especially elegant but rather fast for that range of n. Problem is that for n>1484 the numerator can already get too large to be stored in a float. This method will then fail, as you will run into overflows. But this is not only a problem of numpy.ndarray not working with python integers. Even with them, you would not be able to compute:

misc.comb(10000, 1000, exact=True)/10000**1001

as you want to have a float result in your division of two numbers larger than the maximum a float in python can hold (max_10_exp = 1024 on my system. See sys.float_info().). You couldn't use your range in that case, as well. If you really want to do something like that, you will have to take more care numerically.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for that. I'd kind of assumed that brute force would be the most efficient way of doing it (if that's not a contradiciton in terms). Gut feel from the data I have suggests that n will be somewhere between 100 and 500, although until I run the software I can't tell. But I just wanted to know if there was an easy/elegant way of doing it. (Curiously last night I was watching a video on YouTube in which Brian Kernighan was talking about AMPL, which sounds exactly the sort of thing that can solve this sort of problem).
@TimGJ Probably, there is a more elegant solution. Anyway, you will have to be rather careful about your numerics with numbers that large, though.
You can calculate the function for larger numbers by avoiding calculating the numerator first and then dividing. If instead you use an iterative approach to calculating comb you can divide by n repeatedly as you go along to control the size of the intermediate value. In this way you can evaluate the function for arbitrarily big values of n as long as the overall result is not too large.
I've run a simulation of this, and it does seem to work, more or less, but (understandably) is acutely sensitve to the number of repeat meetings. One minor thing though is that this sensitivity means that as n -> nMax the estimate can easily be > nMax, meaning that the val.argmax call will return 1. I need to play with the simulation further and will write my results up in due couse.
0

You essentially have a nicely smooth function of n that you want to maximise. n is required to be integral but we can consider the function instead to be a function of the reals. In this case, the maximising integral value of n must be close to (next to) the maximising real value.

We could convert comb to a real function by using the gamma function and use numerical optimisation techniques to find the maximum. Another approach is to replace the factorials with Stirling's approximation. This gives a moderately complicated but tractable algebraic expression. This expression is not hard to differentiate and set to zero to find the extrema.

I did this and obtained

n * (b + (n-a) * log((n-a)/n) ) = a * b - a/2

This is not straightforward to solve algebraically but easy enough numerically (e.g. using Newton's method, as you suggest).

I may have made a mistake in the algebra, but I typed the a = 77, b = 100 example into Wolfram Alpha and got 180.58 so the approach seems to work.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.