I have a bunch of 4 million integer vectors that I have to convert to binary. My code is as follows:
def integer_vectors_to_binary(data, bits=16):
bin_arr = []
for arr in tqdm(data, desc="Processing", ncols=100):
binary_array = [list(map(int, format(x, f'0{bits}b'))) for x in arr]
bin_arr.append(np.array(binary_array).flatten())
return np.asarray(bin_arr)
Now, the issue is that the process takes too long for my very large input set. Therefore, I was wondering if this code could be optimized. Here's a sample output:
vec_a = np.asarray([12, 15, 14])
print(integer_vectors_to_binary([vec_a]))
The output is:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0]])
Think of vec_a with 254 elements where each element is an integer with values from 0-65535.
tqdmreally relevant to your question? If your concern is speed, that would be the first thing to drop. If your question is only about the loop body, then just reduce the code to that part for this question.datais an iterable of iterables (arr). So the callinteger_vectors_to_binary(vec_a)will run into an error. How do you call your function?