How to speedup binary transformation from integer values

Question

I wrote the following method (in python 2.7) that generates a set of integers and transform them into binary representation. It takes self-explanatory two parameters: total_num_nodes and dim. It returns numpy matrix-like containing the binary representation of all these integers:

def generate(total_num_nodes, dim):

    # Generate random nodes from the range (0, dim-1) 
    nodes_matrix = [random.randint(0, 2 ** dim - 1) for _ in range(total_num_nodes)]

    # Removes duplicates
    nodes_matrix = list(set(nodes_matrix))

    # Transforms each node from decimal to string representation
    nodes_matrix = [('{0:0' + str(dim) + 'b}').format(x) for x in nodes_matrix]

    # Transforms each bit into an integer.
    nodes_matrix = np.asarray([list(map(int, list(x))) for x in nodes_matrix], dtype=np.uint8)

    return nodes_matrix

The problem is that when I pass very large values, say total_num_nodes= 10,000,000 and dim=128, the generation time takes really long time. A friend of mine hinted me that the following line is actually a bottleneck and it is likely to be responsible for the majority of computation time:

# Transforms each node from decimal to string representation
nodes_matrix = [('{0:0' + str(dim) + 'b}').format(x) for x in nodes_matrix]

I cannot think of other faster method that can replce this line so that I get to speedup the generation time when it is running on a single processor. Any suggestion from you is really really appreciated.

Thank you

Instead of guessing, why don't you try benchmarking the program?\ — user202729
– user202729, Commented Mar 12, 2018 at 1:58
@user202729 Thank you very much for this. I never thought about that. I am still a bit intermediate programmer. I am surprised that the mapping is line that takes the majority of time!!! — Kristy
– Kristy, Commented Mar 12, 2018 at 2:09
I will write a complete answer later. For now, a hint (verified on TIO): Use ord. — user202729
– user202729, Commented Mar 12, 2018 at 2:12
@user202729 Thank you very much. I cannot wait to see your answer. — Kristy
– Kristy, Commented Mar 12, 2018 at 6:44

Mark Tolonen · Accepted Answer · 2018-03-12 03:12:07Z

1

Do it all in numpy and it will be faster.

The following generates total_num_nodes rows of dim columns of np.uint8 data, then keeps the unique rows by providing a view of the data suitable for np.unique, then translating back to a 2D array:

import numpy as np

def generate(total_num_nodes, dim):
    a = np.random.choice(np.array([0,1],dtype=np.uint8),size=(total_num_nodes,dim))
    dtype = a.dtype.descr * dim
    temp = a.view(dtype)
    uniq = np.unique(temp)
    return uniq.view(a.dtype).reshape(-1,dim)

answered Mar 12, 2018 at 3:12

Mark Tolonen

181k26 gold badges183 silver badges279 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Kristy Over a year ago

Impressive! Thank you so so much

user202729 Over a year ago

My method is slower anyway (about 2.5x), so I won't post. ( @Lauren) (using a != '0' seems to be slightly faster than ord(a) - 48)

Collectives™ on Stack Overflow

How to speedup binary transformation from integer values

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related