0

I want to evaluate a binary function at M different locations. The binary functions is of dimension D so the function space to evaluate it is exponential 2^D (i.e. its inputs are all [0,1]^D "strings"). I want its evaluation at M unique/different points that are evenly/randomly distributed in the input space. I know how to generate all permutations if I wanted (essentially just tack a 0 or 1 to all previous permutations that have already been evaluated. (Pseudo)code in the appendix). However, thats exponential and I only need a fraction M points, not all 2^D. Is there a quick way to do it? I thought of generating the binary arrays randomly and then if there was a collision to try another value until it was unique. As in:

while i < n:
    b_D = randomly_generate_bit_array_length(D) # simply selects each bit randomly with 1/2 prob of choosing 0 or 1.
    if not b_n in HashTable[b_D]:
        HashTable[b_D] = b_D
        i++

however, according to the answer I got here that algorithm seems exponential. Is there a better way to do this where the runtime of my algorithm is only M instead of something that depends exponentially on D?

Notice that its not an acceptable answer to try to generate all permutations until we have M points because I need the permutations to be randomly distributed among the binary arrays. i.e. if we start from [0,0,...,0] then generate [1,0,...,0] until we have M is not what I want because most of my arrays will end with 0's and I want to get a unbiased distribution of them.


To generate all permutations:

def _generate_input_data_full(D):
    '''
    full means generate all 2^D.
    '''
    D = D-1
    prev_all_permutations = [ [-1], [1] ]
    current_all_permutations = prev_all_permutations
    for k in range(D):
        current_all_permutations = []
        for prev_perm in prev_all_permutations:
            #print('prev_perm: ', prev_perm)
            for b in [-1,1]:
                new_perm = prev_perm + [b]
                current_all_permutations.append(new_perm)
        prev_all_permutations = current_all_permutations
7
  • 1
    You'd better tell us your D and M. Commented Mar 2, 2017 at 4:24
  • @StefanPochmann why does that matter? They are variables. Initially will be D=8,M=60,000 but I might increase it to D=256 and M=120,000. Commented Mar 2, 2017 at 4:25
  • It matters a lot. Your D=8, M=60,000 doesn't even work (you can't find 60000 different points when there are only 256). And with D=256 and M=120,000 you're practically guaranteed to not have any collisions so it's O(MD). It only degenerates to being really bad if you're somewhat close to exhausting the space. Commented Mar 2, 2017 at 4:29
  • yea. I guess thats right XD, D and M matter. But I guess since D lead to an exponential function 2^D I tend to forget that small D actually lead reasonable numbers (was thinking to much about the rate of growth). Commented Mar 3, 2017 at 16:23
  • Another demonstration that D and M matter, or rather that their relationship matters, is in the random.sample source code. It picks one of two different algorithms, depending on how many elements are wanted and on how large the population is. Commented Mar 3, 2017 at 17:08

1 Answer 1

2

Like mentioned in the comments, if you use something like your D=256 and M=120,000 then your generate-random-and-filter-collisions is practically O(MD) because you won't have any collisions at all.

You pretty much only have a chance at getting collisions if you want a huge number of arrays. As you might know from the birthday paradox, you need about sqrt(|space|) elements for a 50% chance to have any collision. So with D=256, you'd need to ask for about M = 2128 = 340282366920938463463374607431768211456 arrays. Just for a 50% chance to have any collision.

Relevant news: Just a few days ago the first public SHA-1 collision was published. That's only 160 bits, and many very smart people have spent a lot of time and computer power to intentionally find one collision. You really don't have any practical chance to accidentally create collisions with 256 random bits.

Sign up to request clarification or add additional context in comments.

4 Comments

"You really don't have any practical chance to accidentally create collisions with 256 random bits." - To the extent that the bits are random, he doesn't get any choice about whether there's a collision or not. :-)
@AustinHastings Not sure what you're saying. Is there something wrong with what I said?
do I really have to code myself the generate all binary arrays? It seems that this functionality must be implemented somewhere!
@CharlieParker What do you mean "do I really have to..."? Did someone say that? Anyway, there's the random module, there's os.urandom, and NumPy also offers random data functionality. I think those are the best options.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.