Saturday, October 5, 2024

 Kubernetes crash recovery commands I used 99% of the time:

1. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ด๐—ฒ๐˜ ๐—ฝ๐—ผ๐—ฑ๐˜€ --๐—ฎ๐—น๐—น-๐—ป๐—ฎ๐—บ๐—ฒ๐˜€๐—ฝ๐—ฎ๐—ฐ๐—ฒ๐˜€: Check the status of all pods across namespaces to identify failures.

2. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ฑ๐—ฒ๐˜€๐—ฐ๐—ฟ๐—ถ๐—ฏ๐—ฒ ๐—ฝ๐—ผ๐—ฑ ๐—ฝ๐—ผ๐—ฑ_๐—ป๐—ฎ๐—บ๐—ฒ: Gather detailed information about a failed pod.

3. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—น๐—ผ๐—ด๐˜€ ๐—ฝ๐—ผ๐—ฑ_๐—ป๐—ฎ๐—บ๐—ฒ -๐—ฐ ๐—ฐ๐—ผ๐—ป๐˜๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฟ_๐—ป๐—ฎ๐—บ๐—ฒ: View logs of a specific container inside a pod to troubleshoot issues.

4. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ด๐—ฒ๐˜ ๐—ฒ๐˜ƒ๐—ฒ๐—ป๐˜๐˜€ --๐—ฎ๐—น๐—น-๐—ป๐—ฎ๐—บ๐—ฒ๐˜€๐—ฝ๐—ฎ๐—ฐ๐—ฒ๐˜€ --๐˜€๐—ผ๐—ฟ๐˜-๐—ฏ๐˜†='.๐—บ๐—ฒ๐˜๐—ฎ๐—ฑ๐—ฎ๐˜๐—ฎ.๐—ฐ๐—ฟ๐—ฒ๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐—ง๐—ถ๐—บ๐—ฒ๐˜€๐˜๐—ฎ๐—บ๐—ฝ': Review recent events for clues on crashes and errors.

5. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ด๐—ฒ๐˜ ๐—ป๐—ผ๐—ฑ๐—ฒ๐˜€: Verify the status of nodes in the cluster, checking for node failures.

6. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ฑ๐—ฟ๐—ฎ๐—ถ๐—ป ๐—ป๐—ผ๐—ฑ๐—ฒ_๐—ป๐—ฎ๐—บ๐—ฒ --๐—ถ๐—ด๐—ป๐—ผ๐—ฟ๐—ฒ-๐—ฑ๐—ฎ๐—ฒ๐—บ๐—ผ๐—ป๐˜€๐—ฒ๐˜๐˜€: Safely evacuate and cordon a node for recovery operations.

7. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ฐ๐—ผ๐—ฟ๐—ฑ๐—ผ๐—ป ๐—ป๐—ผ๐—ฑ๐—ฒ_๐—ป๐—ฎ๐—บ๐—ฒ: Mark a node as unschedulable to prevent new pods from being scheduled during recovery.

8. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ฑ๐—ฒ๐—น๐—ฒ๐˜๐—ฒ ๐—ฝ๐—ผ๐—ฑ ๐—ฝ๐—ผ๐—ฑ_๐—ป๐—ฎ๐—บ๐—ฒ --๐—ด๐—ฟ๐—ฎ๐—ฐ๐—ฒ-๐—ฝ๐—ฒ๐—ฟ๐—ถ๐—ผ๐—ฑ=0 --๐—ณ๐—ผ๐—ฟ๐—ฐ๐—ฒ: Forcefully delete a crashed pod to restart it or clear it for recovery.

9. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ฟ๐—ผ๐—น๐—น๐—ผ๐˜‚๐˜ ๐˜‚๐—ป๐—ฑ๐—ผ ๐—ฑ๐—ฒ๐—ฝ๐—น๐—ผ๐˜†๐—บ๐—ฒ๐—ป๐˜ ๐—ฑ๐—ฒ๐—ฝ๐—น๐—ผ๐˜†๐—บ๐—ฒ๐—ป๐˜_๐—ป๐—ฎ๐—บ๐—ฒ: Roll back a deployment in case a new rollout causes crashes.

10. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ฒ๐˜…๐—ฒ๐—ฐ -๐—ถ๐˜ ๐—ฝ๐—ผ๐—ฑ_๐—ป๐—ฎ๐—บ๐—ฒ -- /๐—ฏ๐—ถ๐—ป/๐˜€๐—ต: Access a container to debug and resolve application issues directly inside the pod.

11. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ด๐—ฒ๐˜ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ผ๐—ป๐—ฒ๐—ป๐˜๐˜€๐˜๐—ฎ๐˜๐˜‚๐˜€๐—ฒ๐˜€: Check the health of core cluster components like etcd, kube-apiserver, and more.

12. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐˜๐—ผ๐—ฝ ๐—ป๐—ผ๐—ฑ๐—ฒ๐˜€: Monitor node resource usage to detect resource exhaustion causing crashes.

13. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐˜๐—ผ๐—ฝ ๐—ฝ๐—ผ๐—ฑ๐˜€ --๐—ฎ๐—น๐—น-๐—ป๐—ฎ๐—บ๐—ฒ๐˜€๐—ฝ๐—ฎ๐—ฐ๐—ฒ๐˜€: Check pod resource usage across namespaces, identifying bottlenecks leading to crashes.

14. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ฑ๐—ฒ๐—น๐—ฒ๐˜๐—ฒ ๐—ป๐—ผ๐—ฑ๐—ฒ ๐—ป๐—ผ๐—ฑ๐—ฒ_๐—ป๐—ฎ๐—บ๐—ฒ: Remove a failed node from the cluster to allow recovery operations.

15. ๐—ฒ๐˜๐—ฐ๐—ฑ๐—ฐ๐˜๐—น --๐—ฒ๐—ป๐—ฑ๐—ฝ๐—ผ๐—ถ๐—ป๐˜๐˜€=๐—ต๐˜๐˜๐—ฝ๐˜€://๐—ฒ๐˜๐—ฐ๐—ฑ-๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ:2379 ๐˜€๐—ป๐—ฎ๐—ฝ๐˜€๐—ต๐—ผ๐˜ ๐—ฟ๐—ฒ๐˜€๐˜๐—ผ๐—ฟ๐—ฒ ๐—ฏ๐—ฎ๐—ฐ๐—ธ๐˜‚๐—ฝ.๐—ฑ๐—ฏ: Restore etcd from a snapshot in case of etcd failure.

16. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ฎ๐—ฝ๐—ฝ๐—น๐˜† -๐—ณ ๐—ฏ๐—ฎ๐—ฐ๐—ธ๐˜‚๐—ฝ.๐˜†๐—ฎ๐—บ๐—น: Reapply configurations from a backup manifest during recovery.

17. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐˜๐—ฎ๐—ถ๐—ป๐˜ ๐—ป๐—ผ๐—ฑ๐—ฒ๐˜€ ๐—ป๐—ผ๐—ฑ๐—ฒ_๐—ป๐—ฎ๐—บ๐—ฒ ๐—ธ๐—ฒ๐˜†=๐˜ƒ๐—ฎ๐—น๐˜‚๐—ฒ:๐—ก๐—ผ๐—ฆ๐—ฐ๐—ต๐—ฒ๐—ฑ๐˜‚๐—น๐—ฒ: Prevent scheduling on a node experiencing issues during recovery.

18. ๐—ธ๐˜‚๐—ฏ๐—ฒ๐—ฐ๐˜๐—น ๐—ด๐—ฒ๐˜ ๐—ฒ๐—ป๐—ฑ๐—ฝ๐—ผ๐—ถ๐—ป๐˜๐˜€ ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฐ๐—ฒ_๐—ป๐—ฎ๐—บ๐—ฒ: Verify service endpoints during recovery to ensure services are resolving correctly.

No comments: