# chaoskube [![Build Status](https://travis-ci.org/linki/chaoskube.svg?branch=master)](https://travis-ci.org/linki/chaoskube) [![Coverage Status](https://coveralls.io/repos/github/linki/chaoskube/badge.svg?branch=master)](https://coveralls.io/github/linki/chaoskube?branch=master) [![GitHub release](https://img.shields.io/github/release/linki/chaoskube.svg)](https://github.com/linki/chaoskube/releases) [![Docker Repository on Quay](https://quay.io/repository/linki/chaoskube/status "Docker Repository on Quay")](https://quay.io/repository/linki/chaoskube) [![go-doc](https://godoc.org/github.com/linki/chaoskube/chaoskube?status.svg)](https://godoc.org/github.com/linki/chaoskube/chaoskube) `chaoskube` periodically kills random pods in your Kubernetes cluster. <p align="center"></p> ## Why Test how your system behaves under arbitrary pod failures. ## Example Running it will kill a pod in any namespace every 10 minutes by default. ```console $ chaoskube INFO[0000] starting up dryRun=true interval=10m0s version=v0.21.0 INFO[0000] connecting to cluster master="https://kube.you.me" serverVersion=v1.10.5+coreos.0 INFO[0000] setting pod filter annotations= labels= minimumAge=0s namespaces= INFO[0000] setting quiet times daysOfYear="[]" timesOfDay="[]" weekdays="[]" INFO[0000] setting timezone location=UTC name=UTC offset=0 INFO[0001] terminating pod name=kube-dns-v20-6ikos namespace=kube-system INFO[0601] terminating pod name=nginx-701339712-u4fr3 namespace=chaoskube INFO[1201] terminating pod name=kube-proxy-gke-earthcoin-pool-3-5ee87f80-n72s namespace=kube-system INFO[1802] terminating pod name=nginx-701339712-bfh2y namespace=chaoskube INFO[2402] terminating pod name=heapster-v1.2.0-1107848163-bhtcw namespace=kube-system INFO[3003] terminating pod name=l7-default-backend-v1.0-o2hc9 namespace=kube-system INFO[3603] terminating pod name=heapster-v1.2.0-1107848163-jlfcd namespace=kube-system INFO[4203] terminating pod name=nginx-701339712-bfh2y namespace=chaoskube INFO[4804] terminating pod name=nginx-701339712-51nt8 namespace=chaoskube ... ``` `chaoskube` allows to filter target pods [by namespaces, labels, annotations and age](#filtering-targets) as well as [exclude certain weekdays, times of day and days of a year](#limit-the-chaos) from chaos. ## How ### Helm You can install `chaoskube` with [`Helm`](https://github.com/kubernetes/helm). Follow [Helm's Quickstart Guide](https://helm.sh/docs/intro/quickstart/) and then install the `chaoskube` chart. ```console $ helm install stable/chaoskube ``` Refer to [chaoskube on kubeapps.com](https://kubeapps.com/charts/stable/chaoskube) to learn how to configure it and to find other useful Helm charts. ### Raw manifest Refer to [example manifest](./examples/). Be sure to give chaoskube appropriate permissions using provided ClusterRole. ### Configuration By default `chaoskube` will be friendly and not kill anything. When you validated your target cluster you may disable dry-run mode by passing the flag `--no-dry-run`. You can also specify a more aggressive interval and other supported flags for your deployment. If you're running in a Kubernetes cluster and want to target the same cluster then this is all you need to do. If you want to target a different cluster or want to run it locally specify your cluster via the `--master` flag or provide a valid kubeconfig via the `--kubeconfig` flag. By default, it uses your standard kubeconfig path in your home. That means, whatever is the current context in there will be targeted. If you want to increase or decrease the amount of chaos change the interval between killings with the `--interval` flag. Alternatively, you can increase the number of replicas of your `chaoskube` deployment. Remember that `chaoskube` by default kills any pod in all your namespaces, including system pods and itself. `chaoskube` provides a simple HTTP endpoint that can be used to check that it is running. This can be used for [Kubernetes liveness and readiness probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/). By default, this listens on port 8080. To disable, pass `--metrics-address=""` to `chaoskube`. ## Filtering targets However, you can limit the search space of `chaoskube` by providing label, annotation, and namespace selectors, pod name include/exclude patterns, as well as a minimum age setting. ```console $ chaoskube --labels 'app=mate,chaos,stage!=production' ... INFO[0000] setting pod filter labels="app=mate,chaos,stage!=production" ``` This selects all pods that have the label `app` set to `mate`, the label `chaos` set to anything and the label `stage` not set to `production` or unset. You can filter target pods by namespace selector as well. ```console $ chaoskube --namespaces 'default,testing,staging' ... INFO[0000] setting pod filter namespaces="default,staging,testing" ``` This will filter for pods in the three namespaces `default`, `staging` and `testing`. Namespaces can additionally be filtered by a namespace label selector. ```console $ chaoskube --namespace-labels='!integration' ... INFO[0000] setting pod filter namespaceLabels="!integration" ``` This will exclude all pods from namespaces with the label `integration`. You can filter target pods by [OwnerReference's](https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#OwnerReference) kind selector. ```console $ chaoskube --kinds '!DaemonSet,!StatefulSet' ... INFO[0000] setting pod filter kinds="!DaemonSet,!StatefulSet" ``` This will exclude any `DaemonSet` and `StatefulSet` pods. ```console $ chaoskube --kinds 'DaemonSet' ... INFO[0000] setting pod filter kinds="DaemonSet" ``` This will only include any `DaemonSet` pods. Please note: any `include` filter will automatically exclude all the pods with no OwnerReference defined. You can filter pods by name: ```console $ chaoskube --included-pod-names 'foo|bar' --excluded-pod-names 'prod' ... INFO[0000] setting pod filter excludedPodNames=prod includedPodNames="foo|bar" ``` This will cause only pods whose name contains 'foo' or 'bar' and does _not_ contain 'prod' to be targeted. You can also exclude namespaces and mix and match with the label and annotation selectors. ```console $ chaoskube \ --labels 'app=mate,chaos,stage!=production' \ --annotations '!scheduler.alpha.kubernetes.io/critical-pod' \ --namespaces '!kube-system,!production' ... INFO[0000] setting pod filter annotations="!scheduler.alpha.kubernetes.io/critical-pod" labels="app=mate,chaos,stage!=production" namespaces="!kube-system,!production" ``` This further limits the search space of the above label selector by also excluding any pods in the `kube-system` and `production` namespaces as well as ignore all pods that are marked as critical. The annotation selector can also be used to run `chaoskube` as a cluster addon and allow pods to opt-in to being terminated as you see fit. For example, you could run `chaoskube` like this: ```console $ chaoskube --annotations 'chaos.alpha.kubernetes.io/enabled=true' --debug ... INFO[0000] setting pod filter annotations="chaos.alpha.kubernetes.io/enabled=true" DEBU[0000] found candidates count=0 DEBU[0000] no victim found ``` Unless you already use that annotation somewhere, this will initially ignore all of your pods (you can see the number of candidates in debug mode). You could then selectively opt-in individual deployments to chaos mode by annotating their pods with `chaos.alpha.kubernetes.io/enabled=true`. ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: my-app spec: replicas: 3 template: metadata: annotations: chaos.alpha.kubernetes.io/enabled: "true" spec: ... ``` You can exclude pods that h