I'm experiencing a strange issue with an AWS Lambda function that uses a Docker image stored in Amazon ECR. The function has an alias named stable pointing to a published version.
Observed behavior:
- Everything works fine right after deployment.
- After a few hours (typically 6–24), the Lambda invoked via the
stablealias starts failing with:
CodeArtifactUserFailedException: Failed to restore the function xxx: The function does not have permission to access the specified image.
The AWS Console also shows the message: "Failed to restore the function xxx: The function does not have permission to access the specified image."
If I create a new version of the Lambda (with the same image and same configuration and same role) and invoke
$LATEST, it works perfectly.
Context:
The image is stored in a private ECR repository.
There’s a lifecycle policy in place to retain only the last 5 images.
A Lambda warmer runs every 5 minutes to prevent cold starts.
The Lambda has the standard permissions (
AWSLambdaBasicExecutionRole+AmazonEC2ContainerRegistryReadOnly).The lambda becomes inactive:
"State": "Inactive",
"StateReason": "The function does not have permission to access the specified image.",
"StateReasonCode": "ImageAccessDenied",
Hypotheses:
- Could it be that the published Lambda version points to an ECR image digest that gets deleted by the lifecycle policy, making it inaccessible?
- But if that’s the case, why does
$LATESTstill work fine using the same image? - The pulled image corrupts and the warmer make lambda service unable to get rid of corrupted ones, we disable it, but the
stablealias still not works.