Skip to main content

Basic troubleshooting

In this section, we'll use Kiro CLI and the MCP server for Amazon EKS to troubleshoot issues in the EKS cluster.

Let's start by deploying a failing pod in your cluster, which we'll then troubleshoot using Kiro CLI.

~/environment/eks-workshop/modules/aiml/kiro-cli/troubleshoot/failing-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: failing-pod
namespace: default
labels:
app: volume-demo
spec:
containers:
- name: main-container
image: busybox:1.37.0-glibc
command: ["sleep", "3600"]
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
volumeMounts:
# Persistent volume claim - persistent storage
- name: persistent-storage
mountPath: /data

volumes:

# Persistent Volume Claim
- name: persistent-storage
persistentVolumeClaim:
claimName: my-pvc

restartPolicy: Always
serviceAccountName: default
~$kubectl apply -f ~/environment/eks-workshop/modules/aiml/kiro-cli/troubleshoot/failing-pod.yaml

Check the status of the pod:

~$kubectl get pods -n default
NAME          READY   STATUS    RESTARTS   AGE
failing-pod   0/1     Pending   0          5m29s

As you can see, there's a pod in a pending state in the cluster. Let's use Kiro CLI to investigate the cause.

Start a new Kiro CLI session:

~$kiro-cli chat

Ask Kiro CLI to help troubleshoot the issue by entering the following question:

I have a pod stuck in a pending state in my eks-workshop cluster. Find the cause of the failure and provide me with a summary of the approach to solve it.

To address the prompt Kiro CLI will use a variety of tools from the MCP server. Some of the steps it may take include:

  • Identifying the failing pod in the cluster using the list_k8s_resources tool
  • Fetch details of a pod using the manage_k8s_resource tool
  • Inspect Kubernetes event history for the pod using get_k8s_events tool
  • Fetch details of related Kubernetes resources using manage_k8s_resource tool
  • Pull and refer EKS troubleshooting guide using search_eks_troubleshoot_guide tool

Kiro CLI will provide an analysis based on the data it gather from the cluster.

Expand for sample response
## Summary

Root Cause: The pod failing-pod is stuck in Pending state because it references a PersistentVolumeClaim named my-pvc that doesn't exist.

Error Message: persistentvolumeclaim "my-pvc" not found

Approach to Solve:

1. Create the missing PVC - You need to create a PersistentVolumeClaim named my-pvc in the default namespace with appropriate storage class and size requirements

2. Alternative: Update the pod - If the volume isn't actually needed, remove the volume mount and volume definition from the pod spec and recreate it

3. Verify storage class availability - Before creating the PVC, ensure your cluster has a storage class configured (check with kubectl get storageclass)

The pod cannot be scheduled until the PVC exists because Kubernetes needs to ensure the storage is available before placing the pod on a node.

To exit the Kiro CLI session, enter:

/quit

Now, remove the failing Pod:

~$kubectl delete -f ~/environment/eks-workshop/modules/aiml/kiro-cli/troubleshoot/failing-pod.yaml --ignore-not-found

In the next section, we'll explore a more complex troubleshooting scenario.