Using Tensorflow Dataset from_generator() to create multi Input/Output with Custom Generator and ImageDataGenerator

Question

I am trying to scale up my model which uses a "cluster loss" extension, the implementation works so far on MNIST, but I would like to benefit from data augmentation and multi-processing for the real dataset.

In short, the network follows works done with the "centre loss", which resemble a bit a Siamese Network. The important part of the architectures is that the model has 2 inputs and 2 outputs. Therefore, I implemented a custom generator in order to feed the model as follow:

def my_generator(stop):
    i = 0
    while i < stop:
        batch = train_gen.next()
        img = batch[0]
        labels = batch[1]
        labels_size = np.shape(labels)
        cluster = np.zeros(labels_size)
        x = [img, labels]
        y = [labels, cluster]

        yield x, y
        i += 1

which calls the generator ("train_gen") defined as follow:

generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, horizontal_flip=True)
train_gen = generator.flow_from_dataframe(df, x_col='img_path', y_col='label',
                                          class_mode='categorical',
                                          target_size=(32, 32),
                                          batch_size=batch_size)

The generator works if I set only one worker in the fit function. But obviously it's painfully slow... So I tried to use the recommended tf.Data from Tensorflow (tf.data.Dataset.from_generator) to fit my model, but setting it as follow,

ds = tf.data.Dataset.from_generator(my_generator,
                                    args=[num_iter],
                                    output_types=([tf.float32, tf.float32], [tf.float32, tf.float32]))

I got the following error:

TypeError: Cannot convert value [tf.float32, tf.float32] to a Tensorflow DType.

From there, I tried multiple things, following this post

For example, trying to return tuples instead of arrays:

x = (img, labels)
y = (labels, cluster)

But I got:

ValueError: as_list() is not defined on an unknown TensorShape

Does anyone have experience with this? I am not sure to understand the error and I am thinking that I could change the "output_types" argument perhaps, but TensorFlow has no "list" or "tuple" DType argument.

Here is a link to my code which construct a small image dataset from cifar10 to feed a toy model.

Gerry P · Accepted Answer · 2020-12-09 22:33:04Z

0

I do not think your generator works as you expect. Each time it is called it sets i=0. The code after

yield x, y
i += 1

i += 1 never executes. Put a print statement as below

yield x, y
i += 1
print ('the value of i is ',i)

and you will see it never executes.

The above is true if you execute

x,y=next(my_generator(2))

which is how generators are used. However if you execute

x,y=my_generator(2)

then the i += 1 statement does execute. Normally with generators you use them with next(my_generator). model.fit I believe gets the next batch by using next() on the generator you specify.

edited Dec 9, 2020 at 22:33

answered Dec 9, 2020 at 15:53

Gerry P

8,1923 gold badges18 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Michael Stettler Over a year ago

Uhh! You are probably right! I blindly copied copied these lines as it was from the official documentation from tensorflow... (tensorflow.org/guide/data#consuming_python_generators). But this would have put me to an infinite loop no? But apparently I am getting more of a dimensional error of outputting an array.

Michael Stettler Over a year ago

Well so actually it does print it :)

Gerry P Over a year ago

Hmm I copied it and added the print statement and it did not print. Let me try it again!!!

Gerry P Over a year ago

I see what the difference is. see my modified answer

Michael Stettler Over a year ago

Alright I see, well thanks for this, I moved the line before the yield. But though it still doesn't fix the issue when using the tf.data.Dataset.from_generator function. I still miss how I should set the argument output_types.

|

Collectives™ on Stack Overflow

Using Tensorflow Dataset from_generator() to create multi Input/Output with Custom Generator and ImageDataGenerator

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related