1

I am printing the following the error message:

UnknownError                              Traceback (most recent call last)
<ipython-input-11-e73400b11710> in <module>()
      1 earlystopper = EarlyStopping(patience=6, verbose=1)
----> 2 history = parallel_model.fit(X_train, Y_train, validation_split=0.25, batch_size = 16, verbose=1, epochs=30, callbacks=[earlystopper])
      3 model_out = parallel_model.layers[-2]
      4 model_out.save_weights(filepath="./multi_class.hdf5")

~/anaconda/lib/python3.6/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1037                                         initial_epoch=initial_epoch,
   1038                                         steps_per_epoch=steps_per_epoch,
-> 1039                                         validation_steps=validation_steps)
   1040 
   1041     def evaluate(self, x=None, y=None,

~/anaconda/lib/python3.6/site-packages/keras/engine/training_arrays.py in fit_loop(model, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
    197                     ins_batch[i] = ins_batch[i].toarray()
    198 
--> 199                 outs = f(ins_batch)
    200                 outs = to_list(outs)
    201                 for l, o in zip(out_labels, outs):

~/anaconda/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2713                 return self._legacy_call(inputs)
   2714 
-> 2715             return self._call(inputs)
   2716         else:
   2717             if py_any(is_tensor(x) for x in inputs):

~/anaconda/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
   2673             fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
   2674         else:
-> 2675             fetched = self._callable_fn(*array_vals)
   2676         return fetched[:len(self.outputs)]
   2677 

~/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1437           ret = tf_session.TF_SessionRunCallable(
   1438               self._session._session, self._handle, args, status,
-> 1439               run_metadata_ptr)
   1440         if run_metadata:
   1441           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    526             None, None,
    527             compat.as_text(c_api.TF_Message(self.status.status)),
--> 528             c_api.TF_GetCode(self.status.status))
    529     # Delete the underlying status object from memory otherwise it stays alive
    530     # as there is a reference to status from this from the traceback due to

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node replica_0/model_1/conv2d_1/convolution}} = Conv2D[T=DT_FLOAT, _class=["loc:@train...propFilter"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adamax/gradients/replica_0/model_1/conv2d_1/convolution_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, conv2d_1/kernel/read)]]
     [[{{node training/Adamax/gradients/conv2d_transpose_5_1/concat_grad/Slice_2/_1191}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:2", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_3830_training/Adamax/gradients/conv2d_transpose_5_1/concat_grad/Slice_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:2"]()]]

However, I searched up the error message but couldn't find much information. Is this because cudnn wasn't downloaded properly or is this a different issue?

I'm using the following code to download and setup cuda on my cluster. This was working perfectly fine until a week ago. curl -O http://developer.download.nvidia.com/compute/redist/cudnn/v7.0.5/cudnn-9.0-linux-x64-v7.tgz && tar -xzvf cudnn-9.0-linux-x64-v7.tgz && mkdir /usr/local/cuda/include && cp cuda/include/cudnn.h /usr/local/cuda/include && cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 && chmod a+r /usr/local/cuda/include/cudnn.h && chmod a+r /usr/local/cuda/lib64/libcudnn* && cp -P cuda/include/cudnn.h /usr/include && cp -P cuda/lib64/libcudnn* /usr/lib/x86_64-linux-gnu/ && chmod a+r /usr/lib/x86_64-linux-gnu/libcudnn* && rm -r cuda

1 Answer 1

1

Requirement of the version of cudnn changed: https://www.tensorflow.org/install/gpu

cudnn version >= 7.2

I update cudnn and it works well.

Sign up to request clarification or add additional context in comments.

3 Comments

Yep, I realized the same thing. Forgo to update this post, thank you!
Would you happen to have the curl link to that .tgz file so I can just run a command through command line instead manually doing it through the site?
My environment is Windows 10 CUDA 9. You can register for free on developer.nvidia.com. Then you can download the corresponding cudnn archive for your environment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.