New to NN's. A CNN can be trained to detect a single object in an image. However, what if any image in a dataset could contain any n # of objects. Does this not pose a problem to CNNs as the output dense layer has to be a fixed size? How would you solve this problem?
For example: Let's say I randomly sampled 2 images from this set. Image 1 has 2 objects and image 2 has 5 objects. The y label for img1 would contain the bounding box coordinates for 2 objects; the y label for img2 would contain coordinates for 5 objects -- much larger y vector than img1.
A possible solution? :
I would need to find the image with the largest # of objects (designate this value as M). Let's also say an object has 4 coordinates. If M = 5, I would need a y vector of 20. If an image has 1 object, the y vector would contain 4 non-zero values AND 16 zero values. The 4 non-zero values would represent the coordinates and the 16 zero values would represent the coordinates of the other non-existent objects.
