I am using TensorFlow/Keras to create a deep learning model. The network is built as follows:
inps = []
features = []
for i in range(number_windows):
inp = Input(shape=(window_length,), name=f"input_{i}")
inps.append(inp)
feat = Dense(25)(inp)
feat = BatchNormalization()(feat)
feat = LeakyReLU()(feat)
features.append(feat)
comb = concatenate(features)
comb = Dropout(0.50)(comb)
top = Dense(512)(comb)
top = BatchNormalization()(top)
top = LeakyReLU()(top)
top = Dropout(0.40)(top)
top = Dense(256)(top)
emb = EmbeddingLayer()(top)
top = BatchNormalization()(top)
top = LeakyReLU()(top)
top = Dropout(0.25)(top)
classification = Dense(n_classes, activation='softmax', name='classification')(top)
mdl = Model(inputs=inps, outputs=[emb, classification])
The EmbeddingLayer is a custom layer that effectively returns an L2 normalization of the input. I have a data generating function:
def data_loading_generator(
data_matrix: np.typing.NDArray,
data_labels: np.typing.NDArray,
window_length,
dw
):
num_rows = data_matrix.shape[0]
y_onehot = np.stack(
[np.flip(data_labels), data_labels],
axis=1
)
data_segments = segment_data_batch(
data_mat=data_matrix,
w=window_length,
dw=dw
)
for row_number in range(0, num_rows):
yield (
{f"input_{ii}": x[row_number, :] for ii, x in enumerate(data_segments)},
(
{
"embedding_layer": data_labels[row_number],
"classification": y_onehot[row_number, :]
}
)
)
The function segment_data_batch takes in a matrix and outputs a list of overlapping segments from each row of the matrix, length window_length, and overlap window_length - dw. I believe I can optimize this a little by removing the segment_data_batch function and simply segmenting each row of the matrix as they are generated:
def data_loading_generator(
data_matrix: np.typing.NDArray,
data_labels: np.typing.NDArray,
window_length,
dw
):
num_rows = data_matrix.shape[0]
for row_number in range(0, num_rows):
data_segments = segment_data(
spectra_matrix[row_number, :], w=window_length, dw=dw
)
yield (
{f"input_{ii}": data_segments[ii, :] for ii in range(data_segments.shape[0])},
(
{
"embedding_layer": data_labels[row_number],
"classification": tf.one_hot(
data_labels[row_number], depth=2, dtype=tf.uint16
)
}
)
)
The new function segment_data takes a single row in the data_matrix and returns a numpy array number_windows x window_length. However, I'm wondering if I can make this more efficient using native TensorFlow functions.