3

I'm using sagemaker 2.5.1 and tensorflow 2.3.0 The weird part is that the same code worked before, the only change that I could think of is the new release of the two libraries

1

2 Answers 2

2

This appears to be a bug with SageMaker.

I'm assuming you are using a TensorFlow estimator to train the model. Something like this:

estimator = TensorFlow(
    entry_point='script.py',   
    role=role,  
    train_instance_count=1,   
    train_instance_type='ml.p3.2xlarge',  
    framework_version='2.3.0',   
    py_version='py37',  
    script_mode=True,
    hyperparameters={
        'epochs': 100,  
        'batch-size': 256,  
        'learning-rate': 0.001
    } 
)

If that's the case, either TensorFlow 2.2 it TensorFlow 3.3 is causing this error when debugger callbacks are enabled. To fix the issue, you can set the debugger_hook_config to False:

estimator = TensorFlow(
    entry_point='script.py',   
    role=role,  
    train_instance_count=1,   
    train_instance_type='ml.p3.2xlarge',  
    framework_version='2.3.0',   
    py_version='py37',  
    script_mode=True,
    debugger_hook_config=False,
    hyperparameters={
        'epochs': 100,  
        'batch-size': 256,  
        'learning-rate': 0.001
    } 
)
Sign up to request clarification or add additional context in comments.

Comments

1

The problem is actually coming from smdebug version 0.9.1 Downgrading to 0.8.1 solves the issue

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.