Failing to fetch Tensorflow's convolution algorithm

Question

I am printing the following the error message:

UnknownError                              Traceback (most recent call last)
<ipython-input-11-e73400b11710> in <module>()
      1 earlystopper = EarlyStopping(patience=6, verbose=1)
----> 2 history = parallel_model.fit(X_train, Y_train, validation_split=0.25, batch_size = 16, verbose=1, epochs=30, callbacks=[earlystopper])
      3 model_out = parallel_model.layers[-2]
      4 model_out.save_weights(filepath="./multi_class.hdf5")

~/anaconda/lib/python3.6/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1037                                         initial_epoch=initial_epoch,
   1038                                         steps_per_epoch=steps_per_epoch,
-> 1039                                         validation_steps=validation_steps)
   1040 
   1041     def evaluate(self, x=None, y=None,

~/anaconda/lib/python3.6/site-packages/keras/engine/training_arrays.py in fit_loop(model, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
    197                     ins_batch[i] = ins_batch[i].toarray()
    198 
--> 199                 outs = f(ins_batch)
    200                 outs = to_list(outs)
    201                 for l, o in zip(out_labels, outs):

~/anaconda/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2713                 return self._legacy_call(inputs)
   2714 
-> 2715             return self._call(inputs)
   2716         else:
   2717             if py_any(is_tensor(x) for x in inputs):

~/anaconda/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
   2673             fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
   2674         else:
-> 2675             fetched = self._callable_fn(*array_vals)
   2676         return fetched[:len(self.outputs)]
   2677 

~/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1437           ret = tf_session.TF_SessionRunCallable(
   1438               self._session._session, self._handle, args, status,
-> 1439               run_metadata_ptr)
   1440         if run_metadata:
   1441           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    526             None, None,
    527             compat.as_text(c_api.TF_Message(self.status.status)),
--> 528             c_api.TF_GetCode(self.status.status))
    529     # Delete the underlying status object from memory otherwise it stays alive
    530     # as there is a reference to status from this from the traceback due to

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node replica_0/model_1/conv2d_1/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@train...propFilter"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adamax/gradients/replica_0/model_1/conv2d_1/convolution_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, conv2d_1/kernel/read)]]
     [[{{node training/Adamax/gradients/conv2d_transpose_5_1/concat_grad/Slice_2/_1191}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:2", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_3830_training/Adamax/gradients/conv2d_transpose_5_1/concat_grad/Slice_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:2"]()]]

However, I searched up the error message but couldn't find much information. Is this because cudnn wasn't downloaded properly or is this a different issue?

I'm using the following code to download and setup cuda on my cluster. This was working perfectly fine until a week ago. curl -O http://developer.download.nvidia.com/compute/redist/cudnn/v7.0.5/cudnn-9.0-linux-x64-v7.tgz && tar -xzvf cudnn-9.0-linux-x64-v7.tgz && mkdir /usr/local/cuda/include && cp cuda/include/cudnn.h /usr/local/cuda/include && cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 && chmod a+r /usr/local/cuda/include/cudnn.h && chmod a+r /usr/local/cuda/lib64/libcudnn* && cp -P cuda/include/cudnn.h /usr/include && cp -P cuda/lib64/libcudnn* /usr/lib/x86_64-linux-gnu/ && chmod a+r /usr/lib/x86_64-linux-gnu/libcudnn* && rm -r cuda

RuQ Zhou · Accepted Answer · 2018-12-12 01:34:22Z

1

Requirement of the version of cudnn changed: https://www.tensorflow.org/install/gpu

cudnn version >= 7.2

I update cudnn and it works well.

answered Dec 12, 2018 at 1:34

RuQ Zhou

262 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jonathan Over a year ago

Yep, I realized the same thing. Forgo to update this post, thank you!

Jonathan Over a year ago

Would you happen to have the curl link to that .tgz file so I can just run a command through command line instead manually doing it through the site?

RuQ Zhou Over a year ago

My environment is Windows 10 CUDA 9. You can register for free on developer.nvidia.com. Then you can download the corresponding cudnn archive for your environment.

Collectives™ on Stack Overflow

Failing to fetch Tensorflow's convolution algorithm

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related