4

I want to generate n different numbers between 1 and N (of course n<=N). N could be very large. If n is very small, one efficient way is generating a numbers and compare it with the set we have got to make sure it's a new number. It takes O(n^2) time and O(n) memory. If n is quite large, we can use Fisher–Yates shuffle algorithm to generate a random permutation( stop after n steps). It takes O(n) time, but we also must use O(N) memory.

Here is the question. What can we do if we do not know how large n is? I hope that the algorithm just use O(n) memory and stop after O(n) time. Is that possible?

9
  • That's a pretty poor duplicate - N there is 1000, here could be "very large". Commented Oct 31, 2013 at 15:14
  • @j_random_hacker That uses O(N) memory (not O(n)). Commented Oct 31, 2013 at 15:14
  • @jrok: Fair enough, close vote retracted. I do note however an O(1)-space solution on that page: stackoverflow.com/a/202225/47984. Commented Oct 31, 2013 at 15:18
  • 1
    @Floris: The way I interpret it is that they want an online algorithm -- i.e. one where it's always possible to cheaply add a new, distinct sample later on. Commented Oct 31, 2013 at 15:28
  • 1
    @j_random_hacker: If you implement the set as a hash table you can get O(1) (at least for the expected time). Commented Oct 31, 2013 at 15:45

1 Answer 1

0

You can essentially do the same as for very small n, but just make that check more efficient. For example the naïve method of checking if you've already generated a number is to just linearly search the list of previously generated values. For an unknown n you could keep the set of previously generated values sorted so that you can use a more efficient search for identifying duplicates. With the naïve approach the algorithm takes O(n2) time, but a smarter search through previous results can reduce that to O(n*log2 n).

Sign up to request clarification or add additional context in comments.

6 Comments

Inserting a value into a sorted array is O(n) time though. And if n is a large enough fraction of N that duplicates turn up often then an exponential term appears in the time complexity to account for the time spent rerunning it, and will dominate it.
@j_random_hacker: The array doesn't need to be sorted. It could just as easily be a hash table.
@j_random_hacker so don't use an array. A tree can have O(log n) searching and insertion and is easy to keep sorted.
@bames53: I would suggest changing your answer to say that, but I don't think it addresses the main problem I identified -- the exponential time complexity that results when already-selected numbers become sufficiently dense.
@j_random_hacker: You can exponentially rehash a hash table resulting in amortized linear insertion cost, just like a vector. Random numbers are ideal as hash-table keys, too.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.