2

I've seen quite a few CNN code examples for ID'ing images, but they generally relate to a 1-to-1 input to target relationship (like the MNISt handwritten numerals set), and most seem to use similar image dimensions (pixels) for the input image and training images.

So...what is the usual approach for identifying multiple objects in one image? (like several people, or any other relatively complex scene). I've seen it done often enough, but haven't seen design approaches mentioned. Does this require some type of preprocessing or can this be handled directly by a CNN?

1 Answer 1

3

I would say the most known family of techniques to retrieve multiple objects from an images would be the Detection family.

With Detection, the basic idea is to have one or more Proposal windows of different sizes and ratios within an image, generated with either a calculated or random array of algorithms.

For each Proposal window, the Classification algorithm is then executed to reveal what that specific area of the image represents.

The next step would usually be to run a Merge process to combine all neighbouring areas into one single classification output.

Note: A None class is often also used to represent an area with no specific class found.

Sign up to request clarification or add additional context in comments.

5 Comments

The Merge process can instead be a reverse-to-the-pixel function, such that each Proposal window marks the related pixels as probable candidates for that classification then after all the Proposal windows mark the related pixels some function like max is applied to each pixel classification. This can create contoured outlines of classified objects and not just boxes.
Great! As often happens, a couple keywords were valuable for bootstrapping web searches. For the benefit of anyone else searching: One good keyword is "R-CNN", with variants by Facebook's Ross Gershick and others, including "Fast R-CNN" and "Faster R-CNN." It appears that the early approaches relied on multiple passes, and that significant effort has gone into coalescing into a single process. (I'll need to research that further).
@wontonimo: Do you know of examples where contoured outlines are used? Everything I've seen so far uses bounding boxes.
@Hugh sure, a google image search of "convnet contouring objects" will get that for you. graphics.ethz.ch/~perazzif/masktrack/index.html
Excellent! Thanks all. This is exactly what I was looking for. Now if I can find some Tensorflow code, I'll be on track.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.