Skip to content

aws_iam_role_policy_attachment.this missing lifecycle { create_before_destroy = true } in eks-managed-node-group #3696

@dsantanu

Description

@dsantanu

Description

When kubernetes_groups on an aws_eks_access_entry is changed, it triggers assume_role_policy on the node IAM role to become known after apply. This cascades to policy_arn = (known after apply) on aws_iam_role_policy_attachment.this, forcing a +/- destroy+recreate. The destroy half completes successfully, but the recreate silently fails to complete, leaving node roles with no core policies attached (AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly, AmazonEKS_CNI_IPv6_Policy). This breaks the entire cluster — nodes lose ECR access and aws-cni fails.

  • ✋ I have searched the open/closed issues and my issue is not listed.

Cluster impact

Nodes lose ECR pull access, aws-cni gRPC connection fails, new pods cannot be scheduled

Versions

  • Module version [Required]: 21.19.0
  • Terraform version: v1.11.0
  • Provider version(s): v6.23.0

Reproduction Code [Required]

  1. Have an EKS cluster managed by this module with at least one managed node group
  2. Have aws_eks_access_entry resources with kubernetes_groups configured, e.g.:
resource "aws_eks_access_entry" "this" {
  for_each          = var.eks_cluster_ops
  cluster_name      = module.cluster["eks"].cluster_name
  principal_arn     = each.value.principal_arn
  kubernetes_groups = each.value.k8s_groups
  type              = "STANDARD"
}
  1. Change the value of kubernetes_groups on any access entry — e.g. rename a group from cluster:admins to cluster:eksadmins
  2. Run terraform plan - observe the cascade:
  • aws_eks_access_entry → in-place update (expected)
  • module.nodegroups[*].aws_iam_role.this → assume_role_policy becomes known after apply
  • module.nodegroups[*].aws_iam_role_policy_attachment.this[*] → +/- destroy+recreate because policy_arn = (known after apply)
  1. Run terraform apply - apply reports success with no errors
  2. Result: aws_iam_role_policy_attachment.this destroy half completes, recreate half silently does not. Node roles are left with no core policies attached.
  3. Verify with:
aws iam list-attached-role-policies \
  --role-name <node-group-role-name> \
  --query 'AttachedPolicies[].PolicyName'
# Returns only additional policies e.g. AmazonEBSCSIDriverPolicy
# Core policies AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly,
# AmazonEKS_CNI_IPv6_Policy are missing

Expected behavior

lifecycle { create_before_destroy = true } on aws_iam_role_policy_attachment.this (as is already done on aws_launch_template.this and aws_eks_node_group.this) would prevent the gap.

Actual behavior

Perviously attached core-policies are randomly missing

Additional context

Add this to fix:

resource "aws_iam_role_policy_attachment" "this" {
  ...
  lifecycle {
    create_before_destroy = true
  }
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions