I wrote the following method (in python 2.7) that generates a set of integers and transform them into binary representation. It takes self-explanatory two parameters: total_num_nodes and dim. It returns numpy matrix-like containing the binary representation of all these integers:
def generate(total_num_nodes, dim):
# Generate random nodes from the range (0, dim-1)
nodes_matrix = [random.randint(0, 2 ** dim - 1) for _ in range(total_num_nodes)]
# Removes duplicates
nodes_matrix = list(set(nodes_matrix))
# Transforms each node from decimal to string representation
nodes_matrix = [('{0:0' + str(dim) + 'b}').format(x) for x in nodes_matrix]
# Transforms each bit into an integer.
nodes_matrix = np.asarray([list(map(int, list(x))) for x in nodes_matrix], dtype=np.uint8)
return nodes_matrix
The problem is that when I pass very large values, say total_num_nodes= 10,000,000 and dim=128, the generation time takes really long time. A friend of mine hinted me that the following line is actually a bottleneck and it is likely to be responsible for the majority of computation time:
# Transforms each node from decimal to string representation
nodes_matrix = [('{0:0' + str(dim) + 'b}').format(x) for x in nodes_matrix]
I cannot think of other faster method that can replce this line so that I get to speedup the generation time when it is running on a single processor. Any suggestion from you is really really appreciated.
Thank you
ord.