An anti-pattern is a high-risk solution to a problem that’s usually ineffective. At first glance, these types of solutions appear appropriate and effective. However, any gains from these patterns tend to be short-term, with the consequences showing anti-patterns to be more trouble than they’re worth.
When a developer adopts an anti-pattern, there is usually the intention to come back and do it properly down the road, unless other more pressing needs get in the way (and they always do). Deviating from best-practice design patterns creates technical debt, which, sooner or later, must be paid—either in time and effort for refactoring or money due to system unavailability.
Anti-patterns exist in the Kubernetes world. Container orchestrator solutions, particularly Kubernetes, are used with cloud-native workloads in mind, working best when employing particular design patterns.
In this article, we’ll cover some common Kubernetes anti-patterns, some design patterns to use instead, and options for implementing the recommended design patterns.
Cloud-native workloads—particularly those running on Kubernetes—assume elasticity (for example, horizontal scaling), fault-tolerance, and highly heterogeneous environments for maintainability and debugging. Developing new distributed systems or migrating legacy systems to Kubernetes without care can lead to several anti-patterns. Here are some anti-patterns you should be aware of:
A health check allows you to validate the status of a service. It helps assess key information like service availability, system metrics, or available database connections. A service can report its status through health endpoints like healthz
, livez
, or readyz
.
Kubernetes supports Container probes—namely livenessProbe
, readinessProbe
, and startupProbe
—that allow monitoring services and take actions when the probe is successful. Services without health monitoring cannot take advantage of a lot of functionality that orchestrator solutions can automatically provide.
Most application owners prefer zero downtime for change deployments. This is a necessity for most mission-critical applications. Kubernetes allows you to define Recreate
and RollingUpdate
deployment strategies. Recreate
will kill all the Pods before creating new ones, while RollingUpdate
will update Pods in a rolling fashion and permit configuring maxUnavailable
and maxSurge
to control the process.
While these deployment strategies can serve many use cases, they also have limitations. For example, Recreate
causes downtime while RollingUpdate
can make rollback harder. None of these methods allows rapid experimentation and feedback on new versions of your services.
A blue/green deployment is a deployment model that creates copies of your service—the old version being blue and the new version being green—with both services running in parallel. Once you’re confident the new version (green) is ready for release, you can route all production traffic to the new version while keeping the old version (blue) up and running. If there are issues, you can quickly route the traffic back to the previous version, preventing user dissatisfaction or downtime. After some time, with everything running as expected, you can remove the old version of your service.
Canary deployment is a technique that only routes the traffic to the new service for a subset of users. This pattern allows introducing a new version of your service in production while you closely monitor its behavior. If successful, you can expose the new version to more users, eventually migrating everyone to the new code version.
Both techniques are essential in cloud-native environments since they increase service reliability and enable rapid experimentation and development.
Services running on a Kubernetes cluster talk to one another by making remote calls. Because it’s not uncommon for these services to run on different machines, these remote calls are more prone to failures or unresponsiveness. This can lead to issues like cascading failures.
In electronics, a circuit breaker is a switch designed to protect an electrical circuit from damage. An excess of electric current through the circuit breaker will cause the circuit to break, preventing an overload or short circuit. The goal of the circuit breaker is to prevent failure after identifying a fault. Similarly, in software, a circuit breaker monitors services for failures and, once identified, prevents further calls to it. This allows systems to deal with the failure and route requests to healthy instances of the same service.
Without such mechanisms in place, a Kubernetes cluster running a distributed, service-based application will be prone to failures.
Observability is key to understanding system behavior, and effective observability depends on the proper collection of metrics. Metrics provide the pieces of information when you want to know: what your services are doing, how well they are performing, why something went wrong, and, possibly, how to debug the issue. In complex distributed systems, metrics coupled with other forms of observability (such as traces) allow you to understand your systems holistically through a single pane of glass.
A lack of key metrics will severely limit your ability to understand how your services are performing and if they're performing at the desired level. As the complexity of your system increases, it’s necessary to collect more data points from more endpoints.
One way to implement these patterns is by using an Ingress Controller such as the Kong Ingress Controller (KIC) KIC is an Ingress implementation for Kubernetes. It enables configuration of routing, health checks, and load balancing, and it supports a variety of plugins that provide advanced functionality.
KIC can help address the anti-patterns we’ve discussed above.
The KIC can be configured for passive health checks and active health checks. Passive health checks will monitor your services on each request and, upon a certain number of failures, will short-circuit requests to the failing Pods. Active health checks will periodically monitor services at predefined intervals, marking failing Pods as unavailable.
Passive and active health checks will increase the reliability of your system by running regular health checks and taking proactive measures. KIC will use the information from the health checks to efficiently route requests to healthy replicas of the service.
Code reviews and testing can only give you confidence that your services will work correctly. However, you can’t test every use case or boundary condition. Furthermore, many bugs are only found when the code is in the wild, and your application starts accepting actual user traffic.
Blue/green and canary deployments reduce deployment risk by enabling fast rollbacks and reducing the impact of undesired results.
By deploying both the old and new versions of your service behind the KIC, you can easily route traffic to the new version and roll back to the previous version if necessary. Similarly, using the Canary Release plugin enables you to roll out new changes in a phased manner.
You can configure KIC with checks that will monitor service performance. Based on the results, it can mark unhealthy or unresponsive service replicas as unavailable, preventing any further requests from being routed to those service replicas. Instead, subsequent requests will be routed to healthy copies of the service. Once fixed, the original service replica can be brought back online.
The KIC easily integrates with Prometheus and Grafana—two industry-standard monitoring solutions—giving you visibility into how your services respond to traffic. Moreover, access to these metrics doesn't require any service instrumentation. Prometheus metrics can be exposed for service requests and configuration updates.
Re-architecting entire Kubernetes solutions is time-consuming and not always a viable option. But with a little diligence, you'll be able to undo the effects of previously-adopted anti-patterns by replacing them with an implementation that adheres to Kubernetes best-practice design patterns.
Also Published Here