Target Practice for IrisClusters with KWOK
KWOK, Kubernetes WithOut Kubelet, is a lightweight tool that simulates nodes and pods—without running real workloads—so you can quickly test and scale IrisCluster behavior, scheduling, and zone assignment. For those of you wondering what value is in this without the IRIS workload, you will quickly realize it when you play with your Desk Toys awaiting nodes and pods to come up or get the bill for provisioning expensive disk behind the pvc's for no other reason than just to validate your topology.
Here we will use it to simulate an IrisCluster and target a topology across 4 zones, implementing high availability mirroring across zones, disaster recovery to an alternate zone, and horizontal ephemeral compute (ecp) to a zone of its own. All of this done locally, suitable for repeatable testing, and a valuable validation check mark on the road to production.
Goal
The graphic above sums it up, but lets put it in a list for attestation at the end of the exercise.
⬜ HA Mirror Set, Geographically situated > 500 miles as the crow flies (Boston, Chicago)
⬜ DR Async, Geographically situated > 1000 miles as an Archer eVTOL flies (Seattle)
⬜ Horizontal Compute situated across the pond for print (London)
⬜ Pod Counts Match Topology
lesgo...
Setup
We are using Kwok "In-Cluster", though it supports actually creating the clusters themselves, there will be a single control plane and a single initial node with the KWOK and IKO charts included.
Cluster
Lets provision a local Kind cluster, with a single "physical node" and a control plane.
cat <<EOF | kind create cluster --name ikoplus-kwok --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
networking:
disableDefaultCNI: false
EOF
Should be a simple cluster with a control plane and a worker node.
Install IKO Chart
Install the Operator under test.
helm install iris-operator . --namespace ikoplus
Install KWOK
In-cluster, per the docs.
helm repo add kwok https://kwok.sigs.k8s.io/charts/
helm upgrade --namespace kube-system --install kwok kwok/kwok
helm upgrade --install kwok kwok/stage-fast
helm upgrade --install kwok kwok/metrics-usage
Charts looking good:
Configure KWOK
KWOK configuration for this distraction is two parts:
- Create KWOK Nodes
- Tell KWOK which ones to Manage
Data Nodes
Here I create three specific KWOK nodes to hold the data nodes for HA/DR and the mirrormap topology. The nodes are kubernetes nodes, with specific annotations and labels to identify them as "fake", important to note that all the fields are fully configurable, meaning you can also do things with resources such as cpu and memory so you can test requests for the pods and fully configurable phase values.
Full yaml in the spoiler above, but here are the annotations and label that are important to call out:
- kwok annotation
- iko zone
- type: kwok
Apply the nodes to the cluster and take a peak, you should see your 3 fake data nodes:
Compute Nodes
Now, lets provision many compute nodes because we can, for this, you can use the script below to prompt and loop over the creation.
Lets create 128 compute nodes, which gets us close to a reasonable ecp limit.
The cluster should now be at 133 nodes, 1 control plane, 1 worker, 3 data nodes, and 128 compute.
Now hang this kwok command in another terminal to perpetually annotate.
sween@pop-os-cosmic:~$ kwok --manage-nodes-with-annotation-selector=kwok.x-k8s.io/node=fake
The result should be a pretty busy loop.
Apply IrisCluster SUT
Here is our system under test, the IrisCluster itself.
# full IKO documentation:
# https://docs.intersystems.com/irislatest/csp/docbook/Doc.View.cls?KEY=PAGE_deployment_iko
apiVersion: intersystems.com/v1alpha1
kind: IrisCluster
metadata:
name: ikoplus-kwok
namespace: ikoplus
labels:
type: kwok
spec:
volumes:
- name: foo
emptyDir: {}
topology:
data:
image: containers.intersystems.com/intersystems/iris-community:2025.1
mirrorMap: primary,backup,drasync
mirrored: true
podTemplate:
spec:
preferredZones:
- boston
- chicago
- seattle
webgateway:
replicas: 1
alternativeServers: LoadBalancing
applicationPaths:
- /*
ephemeral: true
image: containers.intersystems.com/intersystems/webgateway-lockeddown:2025.1
loginSecret:
name: webgateway-secret
type: apache-lockeddown
volumeMounts:
- name: foo
mountPath: "/irissys/foo/"
storageDB:
mountPath: "/irissys/foo/data/"
storageJournal1:
mountPath: "/irissys/foo/journal1/"
storageJournal2:
mountPath: "/irissys/foo/journal2/"
storageWIJ:
mountPath: "/irissys/foo/wij/"
arbiter:
image: containers.intersystems.com/intersystems/arbiter:2025.1
compute:
image: containers.intersystems.com/intersystems/irishealth:2025.1
ephemeral: true
replicas: 128
preferredZones:
- london
Explanations
The emptyDir {} volume maneuver is pretty clutch to this test with IKO. The opinionated workflow of a data node would be to use external storage through a CSI or whatever, but is counter intuitive for this test. Thank you to @Steve Lubars from ISC for the hint on this, was pretty stuck for a bit making the IrisCluster all ephemeral.
The mirrorMap and preferred zones specify the topology for the data and compute to target the location of the clusters and declare the roles.
Set the ephemeral flag on the compute to enforce the same thing as the emptyDir{} for the data nodes, and the number of nodes of 128 specifies the replica count which matches the number of nodes we provisioned.
lesgo...
The apply...
Attestation
One command to check them all.
kubectl get pods -A -l 'intersystems.com/role=iris' -o json \
| jq -r --argjson zmap "$(kubectl get nodes -o json \
| jq -c 'reduce .items[] as $n ({}; .[$n.metadata.name] = (
$n.metadata.labels["topology.kubernetes.io/zone"]
// $n.metadata.labels["failure-domain.beta.kubernetes.io/zone"]
// "unknown"))')" '
[ .items[] | {zone: ($zmap[.spec.nodeName] // "unknown")} ]
| group_by(.zone)
| map({zone: .[0].zone, total: length})
| (["ZONE","TOTAL"], (.[] | [ .zone, ( .total|tostring ) ]))
| @tsv' | column -t
ZONE TOTAL
boston 1
chicago 1
london 19
seattle 1
Results
✅ HA Mirror Set, Geographically situated > 500 miles as the crow flies (Boston, Chicago)
✅ DR Async, Geographically situated > 1000 miles as an Archer eVTOL flies (Seattle)
✅ Horizontal Compute situated across the pond for print (London)
❌ Pod Counts Match Topology
So after launching this over and over, it never got to schedule 128 pods to the available compute nodes, and it is due to the KWOK nodes going into NotReady at a random point around 20-50 nodes in. I think this is due to the fact that the ECP nodes rely on a healthy cluster to connect to to continue to provision by IKO... so I have some work to do to see if IKO readiness checks can be faked as well. The other thing I did was cordon the other "worker" to speed things up so the arbiter went to a fake node, I was able to get more compute nodes in with this arrangement. I guess too I could have targetted the topology for the arbiter too, but it seems that those are basically anywhere these days and can be out of cluster.
Fun and useful for 3/4 use cases above at this point for sure.