3

I'm practicing with computer vision in general and specifically with the TensorFlow object detection API, and there are a few things I don't really understand yet.

I'm trying to re-train an SSD model to detect one class of custom objects (guitars).
I've been using ssd_mobilenet_v1_coco and ssd_mobilenet_v2_coco models, with a dataset of 1000K pre-labeled images downloaded from the OpenImage dataset. I used the standard configuration file, only modifying the necessary parts.

I'm getting slightly unsatisfactory detections on small objects, which is supposed to be normal when using SSD models. Here on stackoverflow I saw people suggesting to crop the image in smaller frames, but I'm having trouble understanding a few things:

  1. According to the .config file and the SSD papers, images are resized to the fixed dimension of 300x300 pixels (I'm assuming it holds both when training the model and when using it for inference). So, I guess it means that the original size of training and test/evaluation images doesn't matter, because they're always resized to 300x300 anyway? Then, I don't understand why many people suggest using images of the same size of the ones the models has been trained on...does it matter or not?

  2. It's not really clear to me what "small objects" means in the first place.
    Is it referred to the size ratio between the object and the whole image? So, a small object is one that covers...say, less than 5% of the total image?
    Or is it referred to the number of pixels forming the object?
    In the first case, cropping the image around the object would make sense. In the second case, it shouldn't work, because the number of useful pixels identifying the object stays the same.

Thanks!

2 Answers 2

1

I am not sure for the answer I am giving below but it worked for me, as you correctly said that images are resized to 300x 300 in the config file of ssd_mobilenet-v2, what this resizing does is compress image to 300 x 300 thus loosing the important features. This adversely effect the object that are small in size as they have much to loose. Depending on the GPU power you have you can make some changes in the config file: 1st- change the following line as image_resizer { fixed_shape_resizer { height: 600 width: 600 } } thus now giving double the data(in the config file). 2nd - What the above change will do is throw your GPU out of memory, so you need to reduce the batch size from 24 to 12 or 8, which can lead to over fitting so do check the regularization parameters too.

3rd- optional method is to comment out the following enter image description here

this helps a lot and reduce the time to train by almost half. the trade off is if the image is not aligned as your train data, the confidence level of the model will drop and it may completely not recognize inverted cat.

Sign up to request clarification or add additional context in comments.

2 Comments

Can you add an image description?
My dataset varied from 600 x 900 to 1300 x 1000 so, generally it is said to have consistent images size but I still got good results, 22 fps.
1
  1. I do not see why one would get better results in keeping image size the SSD model was trained on. SSD detectors are fully convolutionnal and covolutions are not concerned with image sizes.
  2. 'Small objects' refers to the number of pixels containing information about the object. Here is how it makes sense to crop images to improve performance on small objects : Tensorflow object detection API performs data augmentations before resizing images, (check inputs.transform_input_data doc strings) so cropping then resizing the cropped image will preserve more information than resizing the full image because the donwsizing factor is smaller for the cropped image than for the full image.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.