0

I'm running Ubuntu on a laptop with a dGPU.

I installed tensorflow using docker: docker pull tensorflow/tensorflow:latest. Tensorflow (2.19.0) can use a XLA CPU and a GPU.

I'm following this tutorial: https://www.tensorflow.org/tutorials/load_data/video.

When running the tutorial, I hit this message XLA compilation requires a fixed tensor list size. Set the max number of elements. This could also happen if you're using a TensorArray in a while loop that does not have its maximum_iteration set, you can fix this by setting maximum_iteration to a suitable value.

The tutorial calls model.fit that triggers the error. According to the python callstack, the problem is around this line:

Stack trace for op definition: 
...
File "usr/local/lib/python3.11/dist-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler
File "usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 371, in fit
File "usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 219, in function

> vi /usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py +371
    def fit(
        ...
        self.stop_training = False
        self.make_train_function()
        callbacks.on_train_begin()
        training_logs = None
        logs = {}
        initial_epoch = self._initial_epoch or initial_epoch
        for epoch in range(initial_epoch, epochs):
            self.reset_metrics()
            callbacks.on_epoch_begin(epoch)
            with epoch_iterator.catch_stop_iteration():
                for step, iterator in epoch_iterator:
                    callbacks.on_train_batch_begin(step)
                    logs = self.train_function(iterator) <= ERROR SPAWNED

This problem is referenced here: https://android.googlesource.com/platform/external/tensorflow/+/refs/heads/android-s-beta-5/tensorflow/compiler/xla/g3doc/known_issues.md#tensorflow-while-loops-need-to-be-bounded-or-have-backprop-disabled and the solution seems to fix maximum_iterations.

I tried to pass maximum_iterations to fit... But I hit this: TypeError: TensorFlowTrainer.fit() got an unexpected keyword argument 'maximum_iterations'

How to fix this error if I can't pass arguments from model.fit to self.train_function?

On dGPU with XLA, how to deal with these error?

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.