Creating custom data_generator in Keras for fit_generate()

Question

I am trying to train a CNN-LSTM to read a sequence of 6 frames at a time to the CNN (VGG16 without top layer) and give the extracted features to an LSTM in Keras.

The issue is that, since I need to send 6 frames at a time, I need to reshape every 6 frames and add a dimension. Also, since the labels are for every frame, I need to create another variable to get the label of the first frame for every sequence and put it in a new array and then feed both to feed the model (code below).

The issue is that the data gets way too big to use model.fit() and even when trying with it on a small part of the data I get weird horrible results, so I am trying to use model.fit_generator to iterate the input to the model. But since I cannot just directly feed the data I load from the dataset (because I need to reshape and do what I explained in the first paragraph), I am trying to make my own generator. However, things are not going well and I keep getting errors saying 'tuple' is not an iterator. Does anyone know how I can fix the code to make it work?

train_batches = ImageDataGenerator().flow_from_directory(train_path, target_size=(224, 224),
                                                         classes=['Bark', 'Bitting', 'Engage', 'Hidden', 'Jump',
                                                                  'Stand', 'Walk'], batch_size=18156, shuffle=False)
valid_batches = ImageDataGenerator().flow_from_directory(valid_path, target_size=(224, 224),
                                                         classes=['Bark', 'Bitting', 'Engage', 'Hidden', 'Jump',
                                                                  'Stand', 'Walk'], batch_size=6, shuffle=False)
test_batches = ImageDataGenerator().flow_from_directory(test_path, target_size=(224, 224),
                                                        classes=['Bark', 'Bitting', 'Engage', 'Hidden', 'Jump',
                                                                 'Stand','Walk'], batch_size=6, shuffle=False)

def train_gen():

    n_frames=6    
    n_samples=6 #to decide
    H=W=224
    C = 3

    imgs, labels = next(train_batches)

    y = np.empty((n_samples, 7))
    j = 0
    for i in range(n_samples):       
        y[i] = labels[j]        
        j +=6
    frame_sequence = imgs.reshape(n_samples,n_frames, H,W,C)

    return frame_sequence,y



def valid_gen():

    v_frames=6
    v_samples=1
    H=W=224
    C = 3

    vimgs,vlabels = next(valid_batches)
    y2 = np.empty((v_samples, 7))

    k = 0
    for l in range(v_samples):       
        y2[l] = vlabels[k]        
        k +=6
    valid_sequence = vimgs.reshape(v_samples,v_frames, H,W,C)

    return valid_sequence,y2

def main():

    cnn = VGG16(weights='imagenet',
                include_top='False', pooling='avg')

    cnn.layers.pop()
    print(cnn.summary())
    cnn.trainable = False

    video_input= Input(shape=(None,224,224,3), name='video_input')   
    print(video_input.shape)

    encoded_frame_sequence = TimeDistributed(cnn)(video_input) # the output will be a sequence of vectors
    encoded_video = LSTM(256)(encoded_frame_sequence)  # the output will be a vector
    output = Dense(7, activation='relu')(encoded_video)
    video_model = Model(inputs=[video_input], outputs=output)

    tr_data = train_gen()
    vd_data= valid_gen()
    print(video_model.summary())

    imgs, labels = next(train_batches)    
    vimgs,vlabels = next(valid_batches)

    print("Training ...")
    video_model.compile(Adam(lr=.001), loss='categorical_crossentropy', metrics=['accuracy'])
    video_model.fit_generator(tr_data, 
                              steps_per_epoch=1513, 
                             validation_data=vd_data, 
                              validation_steps=431, 
                              epochs=1, 
                              verbose=2)

Is there a mistake in the way I define the generator?

Wazaki · Accepted Answer · 2018-06-04 19:53:20Z

2

It seems like the way I defined the generators was not correct. As a Keras admin explained to me, the definition has two issues.

Instead of return we need to use the yield
We need a while True loop to make sure it keeps reading

Note that there are few errors on the rest of the code that I dealt with, but since this question is about the generator, I am only posting an answer about that part (There are two generators but they are similar except for the input):

def train_gen():

    n_frames=6    
    n_samples=5 #to decide
    H=W=224
    C = 3

    while True:

        imgs, labels = next(train_batches)
        #frame_sequence = imgs.reshape(n_samples,n_frames, H,W,C)

        y = np.empty((n_samples, 7))
        j = 0
        #print("labels")
        #print(labels)
        #print("y")
        #print(y.shape)

        if len(labels) == n_frames*n_samples:
            frame_sequence = imgs.reshape(n_samples,n_frames, H,W,C)
            for i in range(n_samples):

                y[i] = labels[j]
               # print("inside: ")
                #print(y[i])
               # print(labels[j])
                j +=6                


        yield frame_sequence,y

answered Jun 4, 2018 at 19:53

Wazaki

8991 gold badge9 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

KDX2 Over a year ago

could you please explain me what the difference between n_samples ia and n_frame. Is it n_samples = number_videos, n_frames = #frames_in_a_video. Hence, this could mean a folder with 100 frames I make into 4 batches of 25 frames?

Wazaki Over a year ago

n_samples refers to the number of sequences. n_frames is the number of frames in 1 sequence. Let's say we consider 1 sequence has 8 frames. That means if I read 16 frames in one iteration, I need to set n_samples to 2 and n_frames = 8. So if you set your sequence length to 25 and you read 100 frames in an iteration, you would indeed have n_samples = 4. It might be too much to handle for your CPU/GPU though

ErenO · Accepted Answer · 2018-06-05 13:42:04Z

0

I think that you should implement a class for the data-generator, I found this link, it might help you. A detailed example of how to use data generators with Keras

answered Jun 5, 2018 at 13:42

ErenO

2612 silver badges4 bronze badges

1 Comment

Wazaki Over a year ago

Thanks for the link. I looked it up when I was trying to solve the issue but the method is too complicated for my purpose. I think the solution I posted does the job fine.

Collectives™ on Stack Overflow

Creating custom data_generator in Keras for fit_generate()

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related