I'm using sagemaker 2.5.1 and tensorflow 2.3.0 The weird part is that the same code worked before, the only change that I could think of is the new release of the two libraries
2 Answers
This appears to be a bug with SageMaker.
I'm assuming you are using a TensorFlow estimator to train the model. Something like this:
estimator = TensorFlow(
entry_point='script.py',
role=role,
train_instance_count=1,
train_instance_type='ml.p3.2xlarge',
framework_version='2.3.0',
py_version='py37',
script_mode=True,
hyperparameters={
'epochs': 100,
'batch-size': 256,
'learning-rate': 0.001
}
)
If that's the case, either TensorFlow 2.2 it TensorFlow 3.3 is causing this error when debugger callbacks are enabled. To fix the issue, you can set the debugger_hook_config to False:
estimator = TensorFlow(
entry_point='script.py',
role=role,
train_instance_count=1,
train_instance_type='ml.p3.2xlarge',
framework_version='2.3.0',
py_version='py37',
script_mode=True,
debugger_hook_config=False,
hyperparameters={
'epochs': 100,
'batch-size': 256,
'learning-rate': 0.001
}
)