How to detect multiple objects using my classification network?

Question

I have created a simple convolution network using keras that comes packed with tensorflow. I have trained the model and the accuracy looks good.

I have trained the network on 10 different classes. The network is able to differentiate between each of the 10 classes with an accuracy of 0.93.

Now, it is very much possible that there are multiple classes in the same image. Is there a way I could use my trained network to detect multiple objects in the same image? The best thing would be to get the coordinates/bounding-box around the objects detected, so that it is easier to test/visualize.

Here is how I wrote the network:

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(128, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.BatchNormalization(input_shape=x_train.shape[1:]))
model.add(tf.keras.layers.Conv2D(256, (5, 5), padding='same', activation='elu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(256))
model.add(tf.keras.layers.Activation('elu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(10))
model.add(tf.keras.layers.Activation('softmax'))

model.compile(
    optimizer=tf.train.AdamOptimizer(learning_rate=1e-3, ),
    loss=tf.keras.losses.sparse_categorical_crossentropy,
    metrics=['sparse_categorical_accuracy']
)

def train_gen(batch_size):
 while True:
    offset = np.random.randint(0, x_train.shape[0] - batch_size)
    yield x_train[offset:offset+batch_size], y_train[offset:offset + batch_size]


model.fit_generator(
    train_gen(512),
    epochs=15,
    steps_per_epoch=100,
    validation_data=(x_valid, y_valid)
)

This works fine. How could I use this network to detect multiple objects from the 10 classes? Would I have re-train the network in someway?

@orde Thank you. Is this not the correct way to ask? Does the question not make any sense? — Amanda
– Amanda, Commented Apr 19, 2019 at 6:23
You don't have a specific issue with your code to solve. You may want to try SO's sister site: codereview.stackexchange.com. It's likely you'll get better feedback/guidance there. Good luck! — orde
– orde, Commented Apr 19, 2019 at 6:26
@orde This question is not about code-review but seeks an extension to what I have already done which might include modifications to my code. I find it perfectly suited for SO. — Amanda
– Amanda, Commented Apr 19, 2019 at 6:31
You need to use an object detector, like Faster R-CNN, SSD, or YOLO. — Dr. Snoopy
– Dr. Snoopy, Commented Apr 19, 2019 at 6:53

Mario Meissner · Accepted Answer · 2019-04-19 08:01:05Z

2

In order to teach your model to detect more than one class per image, you will need to perform a few changes to your model and data, and re-train it.

Your final activation will now need to be a sigmoid, since you will not predict a single class probability distribution anymore. Now you want each output neuron to predict a value between 0 and 1, with more than one neuron possibly having values close to 1.
Your loss function should now be binary_crossentropy, since you will treat each output neuron as an independent prediction, which you will compare to the true label.
As I see you have been using sparse_categorical_crossentropy, I assume your labels were integers. You will want to change your label encoding to one-hot style now, each label having a len equal to num_classes, and having 1's only at those positions where the image has that class, the rest being 0's.

With these changes, you can now re-train your model to learn to predict more than one class per image.

As for predicting bounding boxes around the objects, that is a very different and much more challenging task. Advanced models such as YOLO or CRNN can do this, but their structure is much more complex.

answered Apr 19, 2019 at 8:01

Mario Meissner

212 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Amanda Over a year ago

I tried with final activation function as softmax , loss as categorical_crossentropy. Does that make any sense?

Mario Meissner Over a year ago

@Amanda The softmax activation function can only be used for single class classification problems, since it outputs a probability distribution that adds up to one. It is not suited for multiclass classification. As for categorical_crossentropy, this is also a single class classification loss, and requires the labels to be one-hot encoded. If you wish to perform multiclass classification, you have to use the parameters that I posted in my answer.

Collectives™ on Stack Overflow

How to detect multiple objects using my classification network?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related