0

When I am runing a tensorflow image train job in the container tensorflow/tensorflow:latest-gpu, it doesn't work.

Error message:

Cannot assign a device for operation InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D: Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
[[node InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D (defined at /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py:1057)  = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/device:GPU:0"](fifo_queue_Dequeue, InceptionV3/Conv2d_1a_3x3/weights/read)]]

GPU info: nvidia-smi Mon Nov 26 07:48:59 2018
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GT 630 Off | 00000000:01:00.0 N/A | N/A | | 25% 47C P0 N/A / N/A | 0MiB / 1998MiB | N/A Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+

2
  • Are you loading a previously trained model? Commented Nov 26, 2018 at 8:08
  • here is my step mkdir download_data_flower mkdir train_output python download_and_convert_data.py --dataset_name=flowers --dataset_dir=download_data_flower python train_image_classifier.py --batch_size=64 --model_name=inception_v3 --dataset_name=flowers --dataset_split_name=train --dataset_dir=download_data_flower --train_dir=train_output Commented Nov 26, 2018 at 9:44

1 Answer 1

0

It seems that you Tensorflow is not detecting any gpu as available but maps the operations to GPU:0. First try this:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

And you'll get the available devices. Is there /device:GPU:0 ?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.