Troubleshooting

Common issues, their root causes, and how to fix them. Each entry includes the symptom, underlying cause, and step-by-step resolution.

"Connection refused" on service access

Symptom: Intermittent Connection refused errors when accessing a Kubernetes Service via its ClusterIP.

Cause: The kube-proxy flushes and rebuilds all iptables rules on every sync cycle. During the brief window between flushing the old rules and installing the new ones, connections are refused. This was fixed with hash-based state comparison — kube-proxy now computes an order-independent hash (XOR) of the current state and skips the flush+rebuild if nothing changed.

Fix:

"Watch failed: context canceled"

Symptom: Watch connections drop immediately with context canceled errors. Controllers fail to receive events. kubectl -w exits unexpectedly.

Cause: HTTP/2 ALPN was not configured on the TLS server. Go's client-go library requires the server to negotiate HTTP/2 via ALPN. Without it, the client falls back to HTTP/1.1 and watches fail.

Fix:

LIST resourceVersion

A related issue was LIST operations returning timestamps instead of etcd mod_revisions as the resourceVersion. This caused 1123+ watch failures in conformance testing. Ensure your storage backend returns proper revision numbers.

DNS not working

Symptom: Pods cannot resolve service names. nslookup kubernetes.default fails from inside a pod.

Cause: One of two issues:

  1. CoreDNS was not deployed — the bootstrap script was not run
  2. The br_netfilter kernel module is not loaded, preventing bridged traffic from being processed by iptables

Fix:

        
terminal
# Run the bootstrap script to deploy CoreDNS bash scripts/bootstrap-cluster.sh # Verify CoreDNS is running kubectl get pods -n kube-system -l k8s-app=kube-dns # Load br_netfilter kernel module (Linux only) sudo modprobe br_netfilter sudo sysctl net.bridge.bridge-nf-call-iptables=1

Pods stuck in Pending

Symptom: Pods remain in Pending phase indefinitely. No events show scheduling attempts.

Cause: Several possible reasons:

Fix:

        
terminal
# Check node status kubectl get nodes # Check for taints on nodes kubectl describe node node-1 | grep -A5 Taints # Check scheduler is running podman compose -f compose.yml ps scheduler podman compose -f compose.yml logs scheduler # Describe the pod to see scheduling events kubectl describe pod <pod-name>

Container restart loops

Symptom: Containers keep restarting. Pod shows high restart count in kubectl get pods.

Cause: Several possibilities:

Fix:

        
terminal
# Check current container logs kubectl logs <pod-name> # Check previous container logs (before restart) kubectl logs <pod-name> --previous # Check pod events for restart reasons kubectl describe pod <pod-name> # Check container exit code and reason kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState}'

PVC stuck in Pending

Symptom: PersistentVolumeClaim stays in Pending status. Pods that reference it cannot start.

Cause:

Fix:

        
terminal
# Check StorageClasses kubectl get storageclasses # Check available PersistentVolumes kubectl get pv # Describe the PVC for events kubectl describe pvc <pvc-name> # Check if there is a default StorageClass kubectl get sc -o jsonpath='{.items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")].metadata.name}'

"Port already in use" on startup

Symptom: podman compose -f compose.yml up fails with bind: address already in use on port 6443 or 2379.

Cause: Another process (minikube, kind, a real Kubernetes cluster, or a previous Rūsternetes instance) is already listening on the same port.

Fix:

        
terminal
# Find what is using port 6443 lsof -i :6443 # Find what is using port 2379 (etcd) lsof -i :2379 # Stop a previous Rūsternetes cluster podman compose -f compose.yml down # Or change the port mapping in compose.yml # ports: "7443:6443"

"Cannot connect to API server"

Symptom: kubectl commands return Unable to connect to the server.

Cause: The API server is not running, TLS certificates are missing or invalid, or the kubeconfig is not set.

Fix:

        
terminal
# Check all services are running podman compose -f compose.yml ps # Check API server logs podman compose -f compose.yml logs api-server # Verify TLS certs exist ls -la .rusternetes/certs/ # Verify KUBECONFIG is set echo $KUBECONFIG # Should be: ~/.kube/rusternetes-config # Full restart podman compose -f compose.yml down podman compose -f compose.yml up -d bash scripts/bootstrap-cluster.sh

Build is slow

Symptom: podman compose -f compose.yml build takes 10–15 minutes. Test binary compilation takes 5–10 minutes.

Cause: This is expected for the first build. Rust compilation of the full workspace (216,000+ lines) is CPU-intensive. Subsequent builds use layer caching and only recompile changed crates.

Fix:

Podman: permission denied

Symptom: kube-proxy fails to start with permission errors related to iptables. Volume mounts fail with EACCES.

Cause: kube-proxy requires CAP_NET_ADMIN capability for iptables manipulation. Rootless Podman does not grant this by default.

Fix:

        
terminal
# Run with rootful Podman sudo podman-compose -f compose.yml up -d # Or set Podman Machine to rootful mode podman machine set --rootful podman machine stop podman machine start

Podman Machine fails on macOS

Symptom: Podman Machine fails to start with VZErrorDomain Code=1 or similar virtualization errors on macOS Sequoia 15.7+.

Cause: Compatibility issues between Podman Machine's virtualization framework and newer macOS versions.

Fix:

Debugging Commands

A reference of useful commands for diagnosing issues:

        
terminal
# Describe a resource for events and status kubectl describe pod <name> kubectl describe node <name> kubectl describe svc <name> # View pod logs (current and previous container) kubectl logs <pod-name> kubectl logs <pod-name> --previous kubectl logs <pod-name> -c <container-name> # View cluster events sorted by time kubectl get events --sort-by='.lastTimestamp' kubectl get events -A --sort-by='.lastTimestamp' # View Docker Compose service logs podman compose -f compose.yml logs -f api-server podman compose -f compose.yml logs -f scheduler podman compose -f compose.yml logs -f controller-manager podman compose -f compose.yml logs -f kubelet podman compose -f compose.yml logs -f kube-proxy # Enable debug logging for a service RUST_LOG=debug podman compose -f compose.yml up api-server # Check service health podman compose -f compose.yml ps curl -k https://localhost:6443/healthz # Check iptables rules (kube-proxy) podman compose -f compose.yml exec kube-proxy iptables -t nat -L KUBE-SERVICES # Check etcd health podman compose -f compose.yml exec etcd etcdctl endpoint health