up
metric has value 1 when Prometheus can reach the pod to collect/scrape the metrics. It might be useful to monitor pod's readiness(in some case) if the scraping is done through the k8s service. But it causes a false positive when Prometheus scrapes directly from the pod.This is the request flow when the metric scraping is done via Kubernetes service.
In here, the kubernetes service work as a load balancer to route the scraping request to pods. So each time, Prometheus can collect the metric from 1 pod. But this setup will not be able to tell how many pods are ready.
That's when scraping directly from pod comes into the picture
With this topology, prometheus can reach all the pods and the
up
metric of each pod will have the value 1, even when the pods are not in the ready state or their readiness probes are failed. This does not happen with the scraping through Kubernetes service because Kubernetes service won't send the request to un-ready pods, it returns 503 instead.To avoid this false positive, we need to introduce a custom gauge metric which will indicate the readiness of the pod. I choose the descriptive name
pod_readiness
for that. But how do we update the value of the metric?In the above picture, I use one servlet filter to catch the HTTP response from the actuator and set the pod_readiness metric's value accordingly.
Once the metric is collected from pods, we can design some Prometheus queries to monitor the number of ready pods and fire the alert if necessary.
Cheers