How to use python generators with neural networks that take in data with x_train and y_train variables?

Question

I have used keras' ImageDataGenerator to create labelled data by following the example in Ch 5 in Francois Chollet's book "Deep Learning with Python." As an example, I subdivided my training directory into cat and dog subdirectories, and then populated it with images. Using the following code, I created a variable that I believe contains both the image and the label.

from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(
   train_dir,
   target_size = (150, 150)
   batch_size = 20,
   class_mode = 'binary')

Later on , after defining a model, you would use the following code to run the model

history = model.fit_generator(
   train_generator,
   steps_per_epoch = 100, 
   epochs =30,
   validation_data = validation_generator, 
   validation_step=50)

Many online examples of Neural Networks have separate variables that hold the test and training data (e.g. x_train, y_train, x_test, y_test). This seems the most popular method. As an example:

(x_train, y_train), (x_test, y_test) = mnist.load_data()

And you would run the model with the following code:

history = model.fit(x_train, y_train, batch_size=128, epochs=5, verbose=False, validation_split=.1)
loss, accuracy  = model.evaluate(x_test, y_test, verbose=False)

Is there a way to convert the data created using the ImageDataGenerator into a format that would allow me to create a x_train, y_train, x_test, y_test data that's correctly formatted? Thanks

Dharman · Accepted Answer · 2020-09-09 16:32:04Z

0

Disclaimer: I never used Keras' ImageDataGenerators before but from the code you provided, I'm guessing you would have to create different instances of ImageDataGenerators for train, valid and test:

train_generator = train_datagen.flow_from_directory(
   train_dir,
   target_size = (150, 150)
   batch_size = 20,
   class_mode = 'binary')
valid_generator = train_datagen.flow_from_directory(
   valid_dir,
   target_size = (150, 150)
   batch_size = 20,
   class_mode = 'binary')

and so on... Also, model.fit_generator() is deprecated.

The best workflow in my experience is to write the data generator yourself. There are a lot of examples on this, for example (https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly). Basically, instead of returning the data by looping over the entire dataset using return in the function, you loop over the entire dataset by batch and yield the data in the function.

edited Sep 9, 2020 at 16:32

Dharman♦

33.9k27 gold badges106 silver badges157 bronze badges

answered Sep 9, 2020 at 16:26

Dawei Wang

1841 silver badge12 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

SP00N Over a year ago

Thanks for trying to answer my question. I understand the need to have different instances of ImageDataGenerators for train, test, and validation. The part that I'm confused on is that ImageDataGenerators holds within it both x and y (the features and the ground truth). How do you extract that from an ImageDataGenerator instance. For example, say I've created train_generator. How do I divide it into x_train, and y_train?

Dawei Wang Over a year ago

Hi! Apparently, keras has a really good example: keras.io/api/preprocessing/image/#imagedatagenerator-class. Maybe this is what you are looking for?

SP00N Over a year ago

Thank you David, greatly appreciate the links and the help

Collectives™ on Stack Overflow

How to use python generators with neural networks that take in data with x_train and y_train variables?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related