Skip to content

Kubernetes

initcontainer check a service for alive before continuing

      initContainers:
        - name: check-mongo-status
          image: busybox:1.28
          command: 
          - "/bin/sh"
          - "-c"
          - |
            i=0
            while true; do
              echo -n "Attempt $((++i))... "
              echo 'quit' | nc -w 3 mongo-service 27017 && echo "OK" && exit 0
              sleep 10
            done

Delete all calico networkpolicy in a namespace

NAMESPACE=resource-sandbox
calicoctl get networkpolicies -n ${NAMESPACE} -o json | jq -r '.items[].metadata.name' | xargs -I {} calicoctl delete networkpolicy {} -n ${NAMESPACE}

In case of diskpressure

Most like docker overlay is using too many space. Prune it.

sudo docker system prune -a -f

Docker too many pull requests

@ref: https://devops.stackexchange.com/questions/16140/how-do-i-get-k3s-to-authenticate-with-docker-hub

Update your /etc/rancher/k3s/registries.yaml to

configs:
  registry-1.docker.io:
    auth:
      username: <YOUR_DOCKERHUB_USERNAME>
      password: <YOUR_DOCKERHUB_PASSWORD>

Restart k3ssudo systemctl force-reload k3s

You can confirm the changes were accepted by checking that your key exists in /var/lib/rancher/k3s/agent/etc/containerd/config.toml.

Content of config.toml

...
[plugins.cri.registry.configs."registry-1.docker.io".auth]
  username = <YOUR_DOCKERHUB_USERNAME>
  password = <YOUR_DOCKERHUB_PASSWORD>

Good container image to try

netshoot - debug k8s networking

Force delete namespace / stuck

Run this if namespace stuck in terminating.

python3 -c "namespace='postgres-ha';import atexit,subprocess,json,requests,sys;proxy_process = subprocess.Popen(['kubectl', 'proxy']);atexit.register(proxy_process.kill);p = subprocess.Popen(['kubectl', 'get', 'namespace', namespace, '-o', 'json'], stdout=subprocess.PIPE);p.wait();data = json.load(p.stdout);data['spec']['finalizers'] = [];requests.put('http://127.0.0.1:8001/api/v1/namespaces/{}/finalize'.format(namespace), json=data).raise_for_status()"

TIP

If the namespace has been marked for deletion without deleting the cert-manager installation first, the namespace may become stuck in a terminating state. This is typically due to the fact that the APIService resource still exists however the webhook is no longer running so is no longer reachable.

Label node

@ref: https://medium.com/kubernetes-tutorials/learn-how-to-assign-pods-to-nodes-in-kubernetes-using-nodeselector-and-affinity-features-e62c437f3cf8 @ref: https://tachingchen.com/blog/kubernetes-assigning-pod-to-nodes/

kubectl label nodes host02 disktype=ssd

Tolerate

add to deployment

tolerations:
- key: "node.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 10
- key: "node.kubernetes.io/not-ready"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 10

One Pod per Node

# made sure one replica is run in each node
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - traefik
        topologyKey: kubernetes.io/hostname

Resource Limit

spec:
  containers:
  - name: app
    image: images.my-company.example/app:v4
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

Wait for Deployment/Daemonset

@ref: https://www.starkandwayne.com/blog/silly-kubectl-trick-5-waiting-for-things-to-finish-up-2/https://linuxhint.com/wait-for-condition-kubectl/

kubectl rollout status daemonset -n kube-system rke2-ingress-nginx-controller -w --timeout 60s

or use -n to not specify timeout and wait till success

kubectl rollout status daemonset -n kube-system rke2-ingress-nginx-controller -n -w

You can also use kubectl wait for deployments:

$ kubectl wait deploy/slow --for condition=available
deployment.apps/slow condition met

Note that the condition for a deployment is available, not ready.

Delete Evicted Pods from One Namespace

NAMESPACE=YOUR_NAMESPACE
kubectl get pod -n ${NAMESPACE} | grep Evicted | awk '{print $1}' | xargs kubectl delete pod -n ${NAMESPACE}

Delete Evicted Pods from All Namespaces

Method 1

#!/bin/bash
# -t remove trailing new line
readarray -t ns <<< $(kubectl get ns | awk '{print $1}' | tail -n +2)

for i in "${ns[@]}"
do
  kubectl get pod -n $i | grep Evicted | awk '{print $1}' | xargs kubectl delete pod -n $i
done

Method 2

#!/bin/bash

# Get a list of all evicted pods
evicted_pods=$(kubectl get pods --all-namespaces | grep Evicted | awk '{print $2}' | awk -F/ '{print $1}' | uniq)

# Loop through the evicted pods and delete them
for pod in $evicted_pods; do
    kubectl delete pods --all-namespaces --field-selector=status.phase=Failed 
done

Configure Node Nviction Hard Pressure

@ref: https://github.com/kubernetes/kubeadm/issues/1464

For k8s bootstrap using kubeadm:

  1. Edit kubelet config map

  2. Add evictionHard map. See https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/ for all fields supported.

    ...
    clusterDomain: cluster.local
    cpuManagerReconcilePeriod: 0s
    evictionPressureTransitionPeriod: 0s
    evictionHard:
      nodefs.available: "5%"
    fileCheckFrequency: 0s
    healthzBindAddress: 127.0.0.1
    ...

Get Pod by Label

LABEL_KEY_VALUE="app.kubernetes.io/name=api-gateway" \
NAMESPACE="ota" \
kubectl exec -it $(kubectl get pods --selector=$LABEL_KEY_VALUE -n $NAMESPACE --no-headers=true | awk '{print $1}') -n $NAMESPACE -- cat /var/log/traefik.log
LABEL_KEY_VALUE="app.kubernetes.io/name=deployments" \
NAMESPACE="ota" \
kubectl logs -f $(kubectl get pods --selector=$LABEL_KEY_VALUE -n $NAMESPACE --no-headers=true | awk '{print $1}') -n $NAMESPACE

Get pods from a specific node

NAMESPACE=namespace
HOSTNAME=hostname
kubectl -n ${NAMESPACE} get pod --field-selector spec.nodeName=${HOSTNAME}

Get the logs for a pod running on a specific node

NAMESPACE=namespace
POD_ID=$(kubectl -n ${NAMESPACE} get pod --field-selector spec.nodeName=$HOSTNAME | tail -n 1 | awk '{print $1}')
kubectl -n ${NAMESPACE} logs -f ${POD_ID}

Modify disk pressure

TIP

The default disk pressure in k3s is 85%. Once the disk quota use reaches this limit, no more pods can be scheduled to this node.

Edit kubelet arguments in /etc/systemd/system/k3s-agent.service

ExecStart=/usr/local/bin/k3s \
    agent \
        '--data-dir=/opt/rancher/k3s' \
        '--server=https://192.168.10.236:6443' \
        '--token=K109bdf5e62167d4b4c51ab9b2163d87d7244c81bbabe016498444fe6dca338794a::server:21bb7a3fd2a408f647ec59e13fba09b4' \
        '--node-ip=192.168.10.237' \
        '--kubelet-arg=eviction-hard=imagefs.available<5%,memory.available<100Mi,nodefs.available<2%' \

Route nodeport port to only specific node

@ref: https://stackoverflow.com/questions/46456239/how-to-expose-a-headless-service-for-a-statefulset-externally-in-kubernetes

How to preserve the source IP in Kubernetes: https://blog.cptsai.com/2020/11/15/k8s-external-traffic-policy/https://blog.getambassador.io/externaltrafficpolicy-local-on-kubernetes-e66e498212f9

Use externalTrafficPolicy: Local when defining spec even if nodeport

e.g.

apiVersion: v1
kind: Service
metadata:
  name: fecb-device-api
  namespace: fecb-system
spec:
  externalTrafficPolicy: Local
  type: NodePort
  selector:
    app: fecb-device-api
  ports:
    - name: http
      protocol: TCP
      port: 9017
      targetPort: 46490
      nodePort: 30894

Get helm manifest for release

helm get manifest <release-name> -n <namespace>