I am trying to use the data_augmentation_options in the .config files to train a network, specifically a ssd_mobilenet_v1, but when I activate the option random_adjust_brightness, I get the error message pasted below very quickly (I activate the option after the step 110000).
I tried reducing the default value:
optional float max_delta=1 [default=0.2];
But the result was the same.
Any idea why? The images are RGB from png files (from the Bosch Small Traffic Lights Dataset).
INFO:tensorflow:global step 110011: loss = 22.7990 (0.357 sec/step)
INFO:tensorflow:global step 110012: loss = 47.8811 (0.401 sec/step)
2017-11-16 11:02:29.114785: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
[[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.114895: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
[[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.114969: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
[[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.115043: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
[[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.115112: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
...
Edit: The workaround I have found is this. The inf or nan is in the loss, so checking the function in /object_detection/core/preprocessor.py doing the brightness randomization:
def random_adjust_brightness(image, max_delta=0.2):
"""Randomly adjusts brightness.
Makes sure the output image is still between 0 and 1.
Args:
image: rank 3 float32 tensor contains 1 image -> [height, width, channels]
with pixel values varying between [0, 1].
max_delta: how much to change the brightness. A value between [0, 1).
Returns:
image: image which is the same shape as input image.
boxes: boxes which is the same shape as input boxes.
"""
with tf.name_scope('RandomAdjustBrightness', values=[image]):
image = tf.image.random_brightness(image, max_delta)
image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
return image
It is assuming that the image values must be between 0.0 and 1.0. Is it possible that the images are actually arriving with 0 mean and even a different range? In that case, the clipping is corrupting them and leading to the fail. Long story short: I commented out the clipping line and it is working (we will see the results).