I have a numpy 2D array of n rows (observations) X m columns (features), where each element is the count of times that feature was observed. I need to convert it to a zero-padded 2D array of feature_indices, where each feature_index is repeated a number of times corresponding to the 'count' in the original 2D array.
This seems like it should be a simple combo of np.where with np.repeat or just expansion using indexing, but I'm not seeing it. Here's a very slow, loopy solution (way too slow to use in practice):
# Loopy solution (way too slow!)
def convert_2Dcountsarray_to_zeropaddedindices(countsarray2D):
rowsums = np.sum(countsarray2D,1)
max_rowsum = np.max(rowsums)
out = []
for row_idx, row in enumerate(countsarray2D):
out_row = [0]*int(max_rowsum - rowsums[row_idx]) #Padding zeros so all out_rows same length
for ele_idx in range(len(row)):
[out_row.append(x) for x in np.repeat(ele_idx, row[ele_idx]) ]
out.append(out_row)
return np.array(out)
# Working example
countsarray2D = np.array( [[1,2,0,1,3],
[0,0,0,0,3],
[0,1,1,0,0]] )
# Shift all features up by 1 (i.e. add a dummy feature 0 we will use for padding)
countsarray2D = np.hstack( (np.zeros((len(countsarray2D),1)), countsarray2D) )
print(convert_2Dcountsarray_to_zeropaddedindices(countsarray2D))
# Desired result:
array([[1 2 2 4 5 5 5]
[0 0 0 0 5 5 5]
[0 0 0 0 0 2 3]])