Previously…
I threw in some tricks I used to make our Windows EKS nodes start in 90s rather than the 5 minutes a standard EKS Windows node takes. I introduced you to the git repo eks-windows-bootstrapper - the application replacing the AWS-provided configuration scripts for EKS Windows.
In today’s story, I will walk you through how to make your Windows nodes take ~30s to start!
In the previous story, we got to the point where our start times looked like this:
Let’s take a closer look at our latencies now:
In order for us to get better launch times, we now have 2 areas where we can improve timing
As we are using the AWS Image Builder (https://aws.amazon.com/image-builder/) to build our custom AMIs as discussed in the previous story, any changes we make to our image need to be made here. Outside AWS Fast Launch - there really isn’t much we can do to help with the Windows boot process, if you are able - you can uninstall Windows Defender which will massively improve start performance.
Side note - I found in our use case we do not actually need Fast Launch as we do not actually need to run the OOBE process. We can set our culture using Set-Culture
, etc. inside one of our components, and as long as we remove the sections in Unattend.xml
related to OOBE, Windows will skip much of the setup process that EC2 Fast Launch deals with. When you start the output AMI, it will start as quickly as a Fast Launch AMI, but without the added expense of pre-provisioning EBS snapshots.
The downside to this approach is that you won’t be able to configure Windows settings on the first boot, what you bake into the AMI is what you will start with - So make sure you set culture/timezone information correctly somewhere in one of your Image Builder components.
For reference:
So with all avenues to make Windows do less stuff on boot exhausted, there was only one choice left…
Changes were made to the EKS Windows Bootstrapper code so it could be run as a Windows Service. The idea behind this is that we could squeeze a few more seconds out if we could run our code as Windows is starting services on boot instead of waiting for the userdata
script to fire. This worked - however, it did break the AWS SSM Agent as a result as it also starts early. In the installation script included in the repo, we work around this by setting the SSM agent startup type to Automatic (Delayed)
.
This basically involved translating the remainder of the code in the standard AWS bootstrapping scripts in EKS-StartupTask.ps1
Setup Container Network
...
[DllImport("vmcompute.dll")]
static extern void HNSCall([MarshalAs(UnmanagedType.LPWStr)] string method,
[MarshalAs(UnmanagedType.LPWStr)] string path,
[MarshalAs(UnmanagedType.LPWStr)] string request,
[MarshalAs(UnmanagedType.LPWStr)] out string response);
HNSCall("POST", "/networks", jsonString, out response);
...
Add routes to created NIC
...
routeAddCommands.Append($"route ADD {ipAddrs[i]} MASK 255.255.255.255 0.0.0.0 IF {vNICIndex}");
...
Process process = new Process();
process.StartInfo.FileName = "cmd.exe";
process.StartInfo.Arguments = $"/C {routeAddCommands}";
process.StartInfo.RedirectStandardOutput = true;
process.StartInfo.UseShellExecute = false;
process.StartInfo.CreateNoWindow = true;
process.Start();
await process.WaitForExitAsync();
Start Kube-proxy and Kubelet in parallel once all configuration files have been written (all the previously converted code)
To install the bootstrapper, add this to a component in Image Builder:
- name: InstallEksWindowsBootstrapper
action: ExecutePowerShell
inputs:
commands:
- |
Invoke-WebRequest -Uri 'https://github.com/atg-cloudops/eks-windows-bootstrapper/releases/download/v1.29.1/Install-Service.ps1' -OutFile 'Install-Service.ps1';
.\Install-Service.ps1;
Remove-Item 'Install-Service.ps1';
Putting everything together, I tested the output image using these settings:
GP3 - 120GB/600Mbps/3600 IOPS
Testing on these settings got me start times around 55s! We are finally under a minute, maybe spot instances are now viable for Windows EKS Nodes now that the 5-minute penalty is now 1 minute.
A 30s start time is possible! It just costs a lot. 😢
I ended up testing how fast the sum of all the software was by eliminating as many hardware bottlenecks as I could.
Switched from GP3 → IO2 30K IOPS
Switched from m4.xlarge → c6a.4xlarge
Require Nitro Hypervisor
It’s quite easy to change in Karpenter; simply change the Ec2NodeClass to change storage options, and then add the relevant keys in NodePool, e.g.
blockDeviceMappings:
- deviceName: /dev/sda1
ebs:
iops: 30000
volumeSize: 120Gi
volumeType: io2
- key: karpenter.k8s.aws/instance-hypervisor
operator: In
values:
- 'nitro'
- key: karpenter.k8s.aws/instance-cpu
operator: In
values:
- '8'
The beast that was produced by Karpenter was visible in K8s in 32s. It would be sub-30s with antivirus disabled.
Our final tuning produces nodes in around 45s (Using GP3). You can tune for cost/performance yourself for your own scenario.
EKS Windows bootstrapper is a simple app that writes config and starts Kubernetes services. It is currently set up for EKS 1.29. As Kubernetes goes through 1.30, 1.31, 1.32… it will need to be kept up to date… I am looking for help with this. 🙂