I am trying to train a CNN-LSTM to read a sequence of 6 frames at a time to the CNN (VGG16 without top layer) and give the extracted features to an LSTM in Keras.
The issue is that, since I need to send 6 frames at a time, I need to reshape every 6 frames and add a dimension. Also, since the labels are for every frame, I need to create another variable to get the label of the first frame for every sequence and put it in a new array and then feed both to feed the model (code below).
The issue is that the data gets way too big to use model.fit() and even when trying with it on a small part of the data I get weird horrible results, so I am trying to use model.fit_generator to iterate the input to the model. But since I cannot just directly feed the data I load from the dataset (because I need to reshape and do what I explained in the first paragraph), I am trying to make my own generator. However, things are not going well and I keep getting errors saying 'tuple' is not an iterator. Does anyone know how I can fix the code to make it work?
train_batches = ImageDataGenerator().flow_from_directory(train_path, target_size=(224, 224),
classes=['Bark', 'Bitting', 'Engage', 'Hidden', 'Jump',
'Stand', 'Walk'], batch_size=18156, shuffle=False)
valid_batches = ImageDataGenerator().flow_from_directory(valid_path, target_size=(224, 224),
classes=['Bark', 'Bitting', 'Engage', 'Hidden', 'Jump',
'Stand', 'Walk'], batch_size=6, shuffle=False)
test_batches = ImageDataGenerator().flow_from_directory(test_path, target_size=(224, 224),
classes=['Bark', 'Bitting', 'Engage', 'Hidden', 'Jump',
'Stand','Walk'], batch_size=6, shuffle=False)
def train_gen():
n_frames=6
n_samples=6 #to decide
H=W=224
C = 3
imgs, labels = next(train_batches)
y = np.empty((n_samples, 7))
j = 0
for i in range(n_samples):
y[i] = labels[j]
j +=6
frame_sequence = imgs.reshape(n_samples,n_frames, H,W,C)
return frame_sequence,y
def valid_gen():
v_frames=6
v_samples=1
H=W=224
C = 3
vimgs,vlabels = next(valid_batches)
y2 = np.empty((v_samples, 7))
k = 0
for l in range(v_samples):
y2[l] = vlabels[k]
k +=6
valid_sequence = vimgs.reshape(v_samples,v_frames, H,W,C)
return valid_sequence,y2
def main():
cnn = VGG16(weights='imagenet',
include_top='False', pooling='avg')
cnn.layers.pop()
print(cnn.summary())
cnn.trainable = False
video_input= Input(shape=(None,224,224,3), name='video_input')
print(video_input.shape)
encoded_frame_sequence = TimeDistributed(cnn)(video_input) # the output will be a sequence of vectors
encoded_video = LSTM(256)(encoded_frame_sequence) # the output will be a vector
output = Dense(7, activation='relu')(encoded_video)
video_model = Model(inputs=[video_input], outputs=output)
tr_data = train_gen()
vd_data= valid_gen()
print(video_model.summary())
imgs, labels = next(train_batches)
vimgs,vlabels = next(valid_batches)
print("Training ...")
video_model.compile(Adam(lr=.001), loss='categorical_crossentropy', metrics=['accuracy'])
video_model.fit_generator(tr_data,
steps_per_epoch=1513,
validation_data=vd_data,
validation_steps=431,
epochs=1,
verbose=2)
Is there a mistake in the way I define the generator?