Saturday, October 5, 2024

 Kubernetes crash recovery commands I used 99% of the time:

1. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗴𝗲𝘁 𝗽𝗼𝗱𝘀 --𝗮𝗹𝗹-𝗻𝗮𝗺𝗲𝘀𝗽𝗮𝗰𝗲𝘀: Check the status of all pods across namespaces to identify failures.

2. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗱𝗲𝘀𝗰𝗿𝗶𝗯𝗲 𝗽𝗼𝗱 𝗽𝗼𝗱_𝗻𝗮𝗺𝗲: Gather detailed information about a failed pod.

3. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗹𝗼𝗴𝘀 𝗽𝗼𝗱_𝗻𝗮𝗺𝗲 -𝗰 𝗰𝗼𝗻𝘁𝗮𝗶𝗻𝗲𝗿_𝗻𝗮𝗺𝗲: View logs of a specific container inside a pod to troubleshoot issues.

4. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗴𝗲𝘁 𝗲𝘃𝗲𝗻𝘁𝘀 --𝗮𝗹𝗹-𝗻𝗮𝗺𝗲𝘀𝗽𝗮𝗰𝗲𝘀 --𝘀𝗼𝗿𝘁-𝗯𝘆='.𝗺𝗲𝘁𝗮𝗱𝗮𝘁𝗮.𝗰𝗿𝗲𝗮𝘁𝗶𝗼𝗻𝗧𝗶𝗺𝗲𝘀𝘁𝗮𝗺𝗽': Review recent events for clues on crashes and errors.

5. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗴𝗲𝘁 𝗻𝗼𝗱𝗲𝘀: Verify the status of nodes in the cluster, checking for node failures.

6. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗱𝗿𝗮𝗶𝗻 𝗻𝗼𝗱𝗲_𝗻𝗮𝗺𝗲 --𝗶𝗴𝗻𝗼𝗿𝗲-𝗱𝗮𝗲𝗺𝗼𝗻𝘀𝗲𝘁𝘀: Safely evacuate and cordon a node for recovery operations.

7. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗰𝗼𝗿𝗱𝗼𝗻 𝗻𝗼𝗱𝗲_𝗻𝗮𝗺𝗲: Mark a node as unschedulable to prevent new pods from being scheduled during recovery.

8. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗱𝗲𝗹𝗲𝘁𝗲 𝗽𝗼𝗱 𝗽𝗼𝗱_𝗻𝗮𝗺𝗲 --𝗴𝗿𝗮𝗰𝗲-𝗽𝗲𝗿𝗶𝗼𝗱=0 --𝗳𝗼𝗿𝗰𝗲: Forcefully delete a crashed pod to restart it or clear it for recovery.

9. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗿𝗼𝗹𝗹𝗼𝘂𝘁 𝘂𝗻𝗱𝗼 𝗱𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗱𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁_𝗻𝗮𝗺𝗲: Roll back a deployment in case a new rollout causes crashes.

10. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗲𝘅𝗲𝗰 -𝗶𝘁 𝗽𝗼𝗱_𝗻𝗮𝗺𝗲 -- /𝗯𝗶𝗻/𝘀𝗵: Access a container to debug and resolve application issues directly inside the pod.

11. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗴𝗲𝘁 𝗰𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁𝘀𝘁𝗮𝘁𝘂𝘀𝗲𝘀: Check the health of core cluster components like etcd, kube-apiserver, and more.

12. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝘁𝗼𝗽 𝗻𝗼𝗱𝗲𝘀: Monitor node resource usage to detect resource exhaustion causing crashes.

13. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝘁𝗼𝗽 𝗽𝗼𝗱𝘀 --𝗮𝗹𝗹-𝗻𝗮𝗺𝗲𝘀𝗽𝗮𝗰𝗲𝘀: Check pod resource usage across namespaces, identifying bottlenecks leading to crashes.

14. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗱𝗲𝗹𝗲𝘁𝗲 𝗻𝗼𝗱𝗲 𝗻𝗼𝗱𝗲_𝗻𝗮𝗺𝗲: Remove a failed node from the cluster to allow recovery operations.

15. 𝗲𝘁𝗰𝗱𝗰𝘁𝗹 --𝗲𝗻𝗱𝗽𝗼𝗶𝗻𝘁𝘀=𝗵𝘁𝘁𝗽𝘀://𝗲𝘁𝗰𝗱-𝘀𝗲𝗿𝘃𝗲𝗿:2379 𝘀𝗻𝗮𝗽𝘀𝗵𝗼𝘁 𝗿𝗲𝘀𝘁𝗼𝗿𝗲 𝗯𝗮𝗰𝗸𝘂𝗽.𝗱𝗯: Restore etcd from a snapshot in case of etcd failure.

16. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗮𝗽𝗽𝗹𝘆 -𝗳 𝗯𝗮𝗰𝗸𝘂𝗽.𝘆𝗮𝗺𝗹: Reapply configurations from a backup manifest during recovery.

17. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝘁𝗮𝗶𝗻𝘁 𝗻𝗼𝗱𝗲𝘀 𝗻𝗼𝗱𝗲_𝗻𝗮𝗺𝗲 𝗸𝗲𝘆=𝘃𝗮𝗹𝘂𝗲:𝗡𝗼𝗦𝗰𝗵𝗲𝗱𝘂𝗹𝗲: Prevent scheduling on a node experiencing issues during recovery.

18. 𝗸𝘂𝗯𝗲𝗰𝘁𝗹 𝗴𝗲𝘁 𝗲𝗻𝗱𝗽𝗼𝗶𝗻𝘁𝘀 𝘀𝗲𝗿𝘃𝗶𝗰𝗲_𝗻𝗮𝗺𝗲: Verify service endpoints during recovery to ensure services are resolving correctly.

Tuesday, July 23, 2024

 


Omg Shit!!! Git???

Git is hard: screwing up is easy, and figuring out how to fix your mistakes is fucking impossible. Git documentation has this chicken and egg problem where you can't search for how to get yourself out of a mess, unless you already know the name of the thing you need to know about in order to fix your problem.

Git does have several powerful features that can help you "go back in time" or undo changes. Here are some common scenarios and how you can resolve them.

So here are some bad situations I've gotten myself into, and how I eventually got myself out of them in plain English.

Oh shit, I did something terribly wrong, please tell me git has a magic time machine!?!

git reflog
# you will see a list of every thing you've
# done in git, across all branches!
# each one has an index HEAD@{index}
# find the one before you broke everything
git reset HEAD@{index}
# magic time machine

You can use this to get back stuff you accidentally deleted, or just to remove some stuff you tried that broke the repo, or to recover after a bad merge, or just to go back to a time when things actually worked. I use reflog A LOT. Mega hat tip to the many many many many many people who suggested adding it!

Oh shit, I committed and immediately realized I need to make one small change!

# make your change
git add . # or add individual files
git commit --amend --no-edit
# now your last commit contains that change!
# WARNING: never amend public commits

This usually happens to me if I commit, then run tests/linters... and FML, I didn't put a space after an equals sign. You could also make the change as a new commit and then do rebase -i in order to squash them both together, but this is about a million times faster.

Warning: You should never amend commits that have been pushed up to a public/shared branch! Only amend commits that only exist in your local copy or you're gonna have a bad time.

Oh shit, I need to change the message on my last commit!

git commit --amend
# follow prompts to change the commit message

Stupid commit message formatting requirements.

Oh shit, I accidentally committed something to master that should have been on a brand new branch!

# create a new branch from the current state of master
git branch some-new-branch-name
# remove the last commit from the master branch
git reset HEAD~ --hard
git checkout some-new-branch-name
# your commit lives in this branch now :)

Note: this doesn't work if you've already pushed the commit to a public/shared branch, and if you tried other things first, you might need to git reset HEAD@{number-of-commits-back} instead of HEAD~. Infinite sadness. Also, many many many people suggested an awesome way to make this shorter that I didn't know myself. Thank you all!

Oh shit, I accidentally committed to the wrong branch!

# undo the last commit, but leave the changes available
git reset HEAD~ --soft
git stash
# move to the correct branch
git checkout name-of-the-correct-branch
git stash pop
git add . # or add individual files
git commit -m "your message here";
# now your changes are on the correct branch

A lot of people have suggested using cherry-pick for this situation too, so take your pick on whatever one makes the most sense to you!

git checkout name-of-the-correct-branch
# grab the last commit to master
git cherry-pick master
# delete it from master
git checkout master
git reset HEAD~ --hard

Oh shit, I tried to run a diff but nothing happened?!

If you know that you made changes to files, but diff is empty, you probably add-ed your files to staging and you need to use a special flag.

git diff --staged

File under ¯\_(ツ)_/¯ (yes, I know this is a feature, not a bug, but it's fucking baffling and non-obvious the first time it happens to you!)

Oh shit, I need to undo a commit from like 5 commits ago!

# find the commit you need to undo
git log
# use the arrow keys to scroll up and down in history
# once you've found your commit, save the hash
git revert [saved hash]
# git will create a new commit that undoes that commit
# follow prompts to edit the commit message
# or just save and commit

Turns out you don't have to track down and copy-paste the old file contents into the existing file in order to undo changes! If you committed a bug, you can undo the commit all in one go with revert.

You can also revert a single file instead of a full commit! But of course, in true git fashion, it's a completely different set of fucking commands...

Oh shit, I need to undo my changes to a file!

# find a hash for a commit before the file was changed
git log
# use the arrow keys to scroll up and down in history
# once you've found your commit, save the hash
git checkout [saved hash] -- path/to/file
# the old version of the file will be in your index
git commit -m "Wow, you don't have to copy-paste to undo"

When I finally figured this out it was HUGE. HUGE. H-U-G-E. But seriously though, on what fucking planet does checkout -- make sense as the best way to undo a file? :shakes-fist-at-linus-torvalds:

Fuck this noise, I give up.

cd ..
sudo rm -r fucking-git-repo-dir
git clone https://some.github.url/fucking-git-repo-dir.git
cd fucking-git-repo-dir

Thanks to Eric V. for this one. All complaints about the use of sudo in this joke can be directed to him.

For real though, if your branch is sooo borked that you need to reset the state of your repo to be the same as the remote repo in a "git-approved" way, try this, but beware these are destructive and unrecoverable actions!

# get the lastest state of origin
git fetch origin
git checkout master
git reset --hard origin/master
# delete untracked files and directories
git clean -d --force
# repeat checkout/reset/clean for each borked branch

*Disclaimer*: This site is not intended to be an exhaustive reference. And yes, there are other ways to do these same things with more theoretical purity or whatever.