I'm practicing with computer vision in general and specifically with the TensorFlow object detection API, and there are a few things I don't really understand yet.
I'm trying to re-train an SSD model to detect one class of custom objects (guitars).
I've been using ssd_mobilenet_v1_coco and ssd_mobilenet_v2_coco models, with a dataset of 1000K pre-labeled images downloaded from the OpenImage dataset. I used the standard configuration file, only modifying the necessary parts.
I'm getting slightly unsatisfactory detections on small objects, which is supposed to be normal when using SSD models. Here on stackoverflow I saw people suggesting to crop the image in smaller frames, but I'm having trouble understanding a few things:
According to the .config file and the SSD papers, images are resized to the fixed dimension of 300x300 pixels (I'm assuming it holds both when training the model and when using it for inference). So, I guess it means that the original size of training and test/evaluation images doesn't matter, because they're always resized to 300x300 anyway? Then, I don't understand why many people suggest using images of the same size of the ones the models has been trained on...does it matter or not?
It's not really clear to me what "small objects" means in the first place.
Is it referred to the size ratio between the object and the whole image? So, a small object is one that covers...say, less than 5% of the total image?
Or is it referred to the number of pixels forming the object?
In the first case, cropping the image around the object would make sense. In the second case, it shouldn't work, because the number of useful pixels identifying the object stays the same.
Thanks!