0

I have used keras' ImageDataGenerator to create labelled data by following the example in Ch 5 in Francois Chollet's book "Deep Learning with Python." As an example, I subdivided my training directory into cat and dog subdirectories, and then populated it with images. Using the following code, I created a variable that I believe contains both the image and the label.

from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(
   train_dir,
   target_size = (150, 150)
   batch_size = 20,
   class_mode = 'binary')

Later on , after defining a model, you would use the following code to run the model

history = model.fit_generator(
   train_generator,
   steps_per_epoch = 100, 
   epochs =30,
   validation_data = validation_generator, 
   validation_step=50)

Many online examples of Neural Networks have separate variables that hold the test and training data (e.g. x_train, y_train, x_test, y_test). This seems the most popular method. As an example:

(x_train, y_train), (x_test, y_test) = mnist.load_data()

And you would run the model with the following code:

history = model.fit(x_train, y_train, batch_size=128, epochs=5, verbose=False, validation_split=.1)
loss, accuracy  = model.evaluate(x_test, y_test, verbose=False)

Is there a way to convert the data created using the ImageDataGenerator into a format that would allow me to create a x_train, y_train, x_test, y_test data that's correctly formatted? Thanks

1 Answer 1

0

Disclaimer: I never used Keras' ImageDataGenerators before but from the code you provided, I'm guessing you would have to create different instances of ImageDataGenerators for train, valid and test:

train_generator = train_datagen.flow_from_directory(
   train_dir,
   target_size = (150, 150)
   batch_size = 20,
   class_mode = 'binary')
valid_generator = train_datagen.flow_from_directory(
   valid_dir,
   target_size = (150, 150)
   batch_size = 20,
   class_mode = 'binary')

and so on... Also, model.fit_generator() is deprecated.

The best workflow in my experience is to write the data generator yourself. There are a lot of examples on this, for example (https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly). Basically, instead of returning the data by looping over the entire dataset using return in the function, you loop over the entire dataset by batch and yield the data in the function.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for trying to answer my question. I understand the need to have different instances of ImageDataGenerators for train, test, and validation. The part that I'm confused on is that ImageDataGenerators holds within it both x and y (the features and the ground truth). How do you extract that from an ImageDataGenerator instance. For example, say I've created train_generator. How do I divide it into x_train, and y_train?
Hi! Apparently, keras has a really good example: keras.io/api/preprocessing/image/#imagedatagenerator-class. Maybe this is what you are looking for?
Thank you David, greatly appreciate the links and the help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.