Error using data augmentation options in the Object Detection API

Question

I am trying to use the data_augmentation_options in the .config files to train a network, specifically a ssd_mobilenet_v1, but when I activate the option random_adjust_brightness, I get the error message pasted below very quickly (I activate the option after the step 110000).

I tried reducing the default value:

optional float max_delta=1 [default=0.2];

But the result was the same.

Any idea why? The images are RGB from png files (from the Bosch Small Traffic Lights Dataset).

INFO:tensorflow:global step 110011: loss = 22.7990 (0.357 sec/step)
INFO:tensorflow:global step 110012: loss = 47.8811 (0.401 sec/step)
2017-11-16 11:02:29.114785: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
     [[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.114895: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
     [[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.114969: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
     [[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.115043: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
     [[Node: CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan.", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]
2017-11-16 11:02:29.115112: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: LossTensor is inf or nan. : Tensor had NaN values
...

Edit: The workaround I have found is this. The inf or nan is in the loss, so checking the function in /object_detection/core/preprocessor.py doing the brightness randomization:

def random_adjust_brightness(image, max_delta=0.2):
  """Randomly adjusts brightness.

  Makes sure the output image is still between 0 and 1.

  Args:
    image: rank 3 float32 tensor contains 1 image -> [height, width, channels]
           with pixel values varying between [0, 1].
    max_delta: how much to change the brightness. A value between [0, 1).

  Returns:
    image: image which is the same shape as input image.
    boxes: boxes which is the same shape as input boxes.
  """
  with tf.name_scope('RandomAdjustBrightness', values=[image]):
    image = tf.image.random_brightness(image, max_delta)
    image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
    return image

It is assuming that the image values must be between 0.0 and 1.0. Is it possible that the images are actually arriving with 0 mean and even a different range? In that case, the clipping is corrupting them and leading to the fail. Long story short: I commented out the clipping line and it is working (we will see the results).

Adpon · Accepted Answer · 2017-11-23 19:35:42Z

Often, getting LossTensor is inf or nan. : Tensor had NaN values is due to an error in the bounding boxes / annotations (Source: https://github.com/tensorflow/models/issues/1881).

I know that the Bosch Small Traffic Light Dataset has some annotations that extend outside of the image dimensions. For example, the height of an image in that dataset is 720 pixels, but some bounding boxes have a height coordinate greater than 720. This is common because whenever the car recording the sequence goes under a traffic light, some of the traffic light is visible, and some of it is cut off.

I know this isn't an exact answer to your question, but hopefully it provides insight on a possible reason why you are having the problem. Perhaps removing annotations that extend outside of the image dimensions will help solve the problem; however, I'm dealing with the same problem except I am not using image preprocessing. On the same dataset, I'm encountering the LossTensor is inf or nan. : Tensor had NaN values error every ~8000 steps.

Berker Logoglu · Accepted Answer · 2017-12-18 06:22:38Z

0

Addition to annotations that extend outside the image dimensions, Bosch Traffic Light detection training dataset also has one image where x_max < x_min and y_max < y_min which causes a negative width and height. This causes the "LossTensor is inf or nan. : Tensor had NaN values" error every ~8000 steps. I had the same error; removing the problematic entries resolved the issue.

answered Dec 18, 2017 at 6:22

Berker Logoglu

1

Comments

Mike Brown · Accepted Answer · 2018-01-17 19:57:10Z

I also ran into this, I ended up writing a quick and dirty script to find the bad eggs. I don't know if the image set changes over time, but the set I downloaded had three bad annotated images.
./rgb/train/2015-10-05-11-26-32_bag/105870.png

./rgb/train/2015-10-05-11-26-32_bag/108372.png

./rgb/train/2015-10-05-14-40-46_bag/462350.png

and for those interested, heres my script:

import yaml
import os

INPUT_YAML = "train.yaml"
examples = yaml.load(open(INPUT_YAML, 'rb').read())
len_examples = len(examples)
print("Loaded ", len(examples), "examples")
for example in examples:
  for box in example['boxes']:
    xmin = float(box['x_min'])
    xmax = float(box['x_max'])
    ymin = float(box['y_min'])
    ymax = float(box['y_max'])
    if xmax < xmin or xmax > 1280 or xmin > 1280:
      print( "INVALID IMAGE: ", example['path'], " X_MAX = ", float(box['x_max']) )
    if ymax < ymin or ymax > 720 or ymin > 720:
      print( "INVALID IMAGE: ", example['path'], " Y_MAX = ", float(box['y_max']) )

Collectives™ on Stack Overflow

Error using data augmentation options in the Object Detection API

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related