0

I have been running this object detection model evaluation on google colab without errors. Now suddenly it does not work anymore but when running the script.

# RUN MODEL EVALUATION
PIPELINE_CONFIG_PATH="./object_detection/checkpoints/detection/{}/pipeline.config".format(selected_model)
MODEL_DIR="./object_detection/checkpoints/detection/{}/checkpoint/".format(selected_model)
CHECKPOINT_DIR="./object_detection/checkpoints/detection/{}/checkpoint/".format(selected_model)

!python ./object_detection/model_main_tf2.py \
  --pipeline_config_path={PIPELINE_CONFIG_PATH} \
  --model_dir={MODEL_DIR} \
  --checkpoint_dir={CHECKPOINT_DIR} \
  --eval_timeout=5 \
  --alsologtostderr

It comes with the following errors:

I1112 16:05:22.433352 139759485175680 checkpoint_utils.py:149] Found new checkpoint at ./object_detection/checkpoints/detection/ssd_mobilenet_v2_320x320_coco17_tpu-8/checkpoint/ckpt-0
/usr/local/lib/python3.7/dist-packages/keras/backend.py:401: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
2021-11-12 16:05:22.520333: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
INFO:tensorflow:depth of additional conv before box predictor: 0
I1112 16:05:31.542140 139759485175680 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I1112 16:05:31.542605 139759485175680 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I1112 16:05:31.542898 139759485175680 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I1112 16:05:31.543214 139759485175680 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I1112 16:05:31.543522 139759485175680 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I1112 16:05:31.543864 139759485175680 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
2021-11-12 16:06:17.471428: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-11-12 16:06:17.474623: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
INFO:tensorflow:Encountered 2 root error(s) found.
  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node ssd_mobile_net_v2keras_feature_extractor/model/Conv1/Conv2D (defined at /usr/local/lib/python3.7/dist-packages/object_detection/models/ssd_mobilenet_v2_keras_feature_extractor.py:161) ]]
     [[Identity_18/_1166]]
  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node ssd_mobile_net_v2keras_feature_extractor/model/Conv1/Conv2D (defined at /usr/local/lib/python3.7/dist-packages/object_detection/models/ssd_mobilenet_v2_keras_feature_extractor.py:161) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_compute_eval_dict_24301]

Errors may have originated from an input operation.
Input Source operations connected to node ssd_mobile_net_v2keras_feature_extractor/model/Conv1/Conv2D:
 features_1 (defined at /usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py:932)

Input Source operations connected to node ssd_mobile_net_v2keras_feature_extractor/model/Conv1/Conv2D:
 features_1 (defined at /usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py:932)

Function call stack:
compute_eval_dict -> compute_eval_dict
 exception.
I1112 16:06:19.558837 139759485175680 model_lib_v2.py:934] Encountered 2 root error(s) found.
  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node ssd_mobile_net_v2keras_feature_extractor/model/Conv1/Conv2D (defined at /usr/local/lib/python3.7/dist-packages/object_detection/models/ssd_mobilenet_v2_keras_feature_extractor.py:161) ]]
     [[Identity_18/_1166]]
  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node ssd_mobile_net_v2keras_feature_extractor/model/Conv1/Conv2D (defined at /usr/local/lib/python3.7/dist-packages/object_detection/models/ssd_mobilenet_v2_keras_feature_extractor.py:161) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_compute_eval_dict_24301]

Errors may have originated from an input operation.
Input Source operations connected to node ssd_mobile_net_v2keras_feature_extractor/model/Conv1/Conv2D:
 features_1 (defined at /usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py:932)

Input Source operations connected to node ssd_mobile_net_v2keras_feature_extractor/model/Conv1/Conv2D:
 features_1 (defined at /usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py:932)

Function call stack:
compute_eval_dict -> compute_eval_dict
 exception.
INFO:tensorflow:A replica probably exhausted all examples. Skipping pending examples on other replicas.
I1112 16:06:19.559331 139759485175680 model_lib_v2.py:935] A replica probably exhausted all examples. Skipping pending examples on other replicas.
Traceback (most recent call last):
  File "./object_detection/model_main_tf2.py", line 115, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "./object_detection/model_main_tf2.py", line 90, in main
    wait_interval=300, timeout=FLAGS.eval_timeout)
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 1157, in eval_continuously
    global_step=global_step,
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 1001, in eager_eval_loop
    for evaluator in evaluators:
TypeError: 'NoneType' object is not iterable

It was still working last week but for some reason not anymore. Anyone else struggling with this same problem? Some problems with Colab environment I guess but don't know what I should change. TF2 object detection API installed and tested that it is working

Tensorflow 2.6.2
Found GPU at: /device:GPU:0
1
  • thinking about stop developing in Colab. Running into too many of this unstable conditions which I do not have any control. Very poor maintainence job by Colab. Commented Feb 9, 2022 at 17:39

1 Answer 1

2

The error was occurring because of the wrong cuDNN version on Google Colab.

I was able to fix it by downloading the correct version of cuDNN from the NVidia developer site, and then installing it into Google Colab. I first copied the cuDNN package into my Google Colab notebook from Google Drive, and then installed it using the following:

!dpkg -i libcudnn8_8.1.0.77-1+cuda11.2_amd64.deb
# Check if package has been installed
!ls -l /usr/lib/x86_64-linux-gnu/libcudnn.so.*
Sign up to request clarification or add additional context in comments.

1 Comment

Wouldn't this be deleted after some time?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.