Recognizing multiple objects in an image with convolutional neural networks

Question

I've seen quite a few CNN code examples for ID'ing images, but they generally relate to a 1-to-1 input to target relationship (like the MNISt handwritten numerals set), and most seem to use similar image dimensions (pixels) for the input image and training images.

So...what is the usual approach for identifying multiple objects in one image? (like several people, or any other relatively complex scene). I've seen it done often enough, but haven't seen design approaches mentioned. Does this require some type of preprocessing or can this be handled directly by a CNN?

WonderWorker · Accepted Answer · 2017-06-07 12:28:56Z

3

I would say the most known family of techniques to retrieve multiple objects from an images would be the Detection family.

With Detection, the basic idea is to have one or more Proposal windows of different sizes and ratios within an image, generated with either a calculated or random array of algorithms.

For each Proposal window, the Classification algorithm is then executed to reveal what that specific area of the image represents.

The next step would usually be to run a Merge process to combine all neighbouring areas into one single classification output.

Note: A None class is often also used to represent an area with no specific class found.

edited Jun 7, 2017 at 12:28

WonderWorker

9,2025 gold badges70 silver badges75 bronze badges

answered Jun 7, 2017 at 12:02

LFX

275 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Anton Panchishin Over a year ago

The Merge process can instead be a reverse-to-the-pixel function, such that each Proposal window marks the related pixels as probable candidates for that classification then after all the Proposal windows mark the related pixels some function like max is applied to each pixel classification. This can create contoured outlines of classified objects and not just boxes.

StringTheory Over a year ago

Great! As often happens, a couple keywords were valuable for bootstrapping web searches. For the benefit of anyone else searching: One good keyword is "R-CNN", with variants by Facebook's Ross Gershick and others, including "Fast R-CNN" and "Faster R-CNN." It appears that the early approaches relied on multiple passes, and that significant effort has gone into coalescing into a single process. (I'll need to research that further).

StringTheory Over a year ago

@wontonimo: Do you know of examples where contoured outlines are used? Everything I've seen so far uses bounding boxes.

Anton Panchishin Over a year ago

@Hugh sure, a google image search of "convnet contouring objects" will get that for you. graphics.ethz.ch/~perazzif/masktrack/index.html

StringTheory Over a year ago

Excellent! Thanks all. This is exactly what I was looking for. Now if I can find some Tensorflow code, I'll be on track.

Collectives™ on Stack Overflow

Recognizing multiple objects in an image with convolutional neural networks

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related