I'm running Ubuntu on a laptop with a dGPU.
I installed tensorflow using docker: docker pull tensorflow/tensorflow:latest.
Tensorflow (2.19.0) can use a XLA CPU and a GPU.
I'm following this tutorial: https://www.tensorflow.org/tutorials/load_data/video.
When running the tutorial, I hit this message XLA compilation requires a fixed tensor list size. Set the max number of elements. This could also happen if you're using a TensorArray in a while loop that does not have its maximum_iteration set, you can fix this by setting maximum_iteration to a suitable value.
The tutorial calls model.fit that triggers the error. According to the python callstack, the problem is around this line:
Stack trace for op definition:
...
File "usr/local/lib/python3.11/dist-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler
File "usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 371, in fit
File "usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 219, in function
> vi /usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py +371
def fit(
...
self.stop_training = False
self.make_train_function()
callbacks.on_train_begin()
training_logs = None
logs = {}
initial_epoch = self._initial_epoch or initial_epoch
for epoch in range(initial_epoch, epochs):
self.reset_metrics()
callbacks.on_epoch_begin(epoch)
with epoch_iterator.catch_stop_iteration():
for step, iterator in epoch_iterator:
callbacks.on_train_batch_begin(step)
logs = self.train_function(iterator) <= ERROR SPAWNED
This problem is referenced here: https://android.googlesource.com/platform/external/tensorflow/+/refs/heads/android-s-beta-5/tensorflow/compiler/xla/g3doc/known_issues.md#tensorflow-while-loops-need-to-be-bounded-or-have-backprop-disabled and the solution seems to fix maximum_iterations.
I tried to pass maximum_iterations to fit... But I hit this: TypeError: TensorFlowTrainer.fit() got an unexpected keyword argument 'maximum_iterations'
How to fix this error if I can't pass arguments from model.fit to self.train_function?
On dGPU with XLA, how to deal with these error?