Knative Scale to Zero: Running Low-Traffic Services Efficiently
Knative Serving provides a simple way to run HTTP workloads on Kubernetes that automatically scale based on traffic. One of the most useful capabilities is scale to zero. When a service receives no traffic, the pods disappear completely. When a request arrives, Knative starts the workload again and routes the request to it.
Motivation
Recently, I discussed this problem with a friend who hosts many small client services on a public cloud platform. These services are used infrequently, some only a few times per month, yet they incur continuous hosting charges. He wanted to reduce costs without sacrificing flexibility. Moving everything back to on-premises hardware seemed like an option, but it would sacrifice the ability to scale globally or easily migrate back to the cloud. Knative with scale-to-zero offers a middle ground: run a small platform at home, host services that consume resources only when actually invoked, and maintain the option to move workloads back to public cloud infrastructure if needs change. For anyone managing many lightweight internal services, especially those used infrequently, this pattern can dramatically reduce infrastructure costs while preserving operational flexibility.
This post explores the concept and mechanics of Knative’s scale-to-zero behavior. It uses NGINX only as a simple example container. A key finding from setting this up is that Knative services must be addressed by their fully qualified DNS name to work reliably within the cluster.
Knative must already be installed in the cluster. If you need installation instructions, refer to the official documentation at https://knative.dev/docs/install/
How Knative Scale-to-Zero Works
When traffic reaches a Knative service, it passes through several platform components before reaching your container. This is how Knative manages autoscaling and cold starts.
flowchart TD
A[Client request] --> B[Service DNS]
B --> C[Knative route]
C --> D{Scaled to zero?}
D -- Yes --> E[Activator]
D -- No --> F[Revision pod]
E --> F
If the service has scaled down, the Activator temporarily receives the request while Knative starts a new pod. Once the pod is ready, traffic flows directly to it.
Inside the revision pod, Knative injects the queue-proxy container alongside your application.
flowchart TD
subgraph RevisionPod["Revision Pod"]
A[queue-proxy]
B[user container]
A --> B
end
The queue-proxy is responsible for request buffering, concurrency control and metrics collection used by the Knative autoscaler.
A minimal Knative service example
Below is a simple example service. The container image itself is not important. NGINX is used here only because it is a convenient lightweight HTTP server.
In this example, the service is marked cluster-local (reachable only from inside the cluster), but Knative services can also be public-facing if needed.
apiVersion: v1
kind: Namespace
metadata:
name: knative-demo
---
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: example-service
namespace: knative-demo
labels:
networking.knative.dev/visibility: cluster-local
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/min-scale: "0"
autoscaling.knative.dev/max-scale: "3"
autoscaling.knative.dev/window: "30s"
spec:
containers:
- image: nginx:stable
ports:
- containerPort: 80
Apply the manifest:
kubectl apply -f service.yaml
Knative will automatically create the underlying revision and deployment. You never define the Deployment yourself.
Invoking the service
Knative services are addressed by their DNS name. For cluster-local services, use the internal DNS:
example-service.knative-demo.svc.cluster.local
Requests can come from any pod in the cluster. A simple way to test this is to run a temporary container and call the service.
kubectl run curl --rm -it --image=curlimages/curl -- \
curl http://example-service.knative-demo.svc.cluster.local
The first request may take slightly longer because the service may need to scale from zero.
Observing scale to zero
You can watch the revision pods while sending traffic.
kubectl get pods -n knative-demo -w
The behaviour typically looks like this:
stateDiagram-v2
[*] --> Idle
Idle --> ZeroPods: service ready
ZeroPods --> TrafficArrives: request received
TrafficArrives --> PodStarting: coordinator scales
PodStarting --> Running: pod ready
Running --> TrafficStops: no traffic
TrafficStops --> ZeroPods: grace period expires
This allows clusters to host many small services without keeping them permanently running.
Note on Istio and Knative
Knative always injects the
queue-proxycontainer into revision pods to handle request queuing, concurrency control, and metrics collection. If Istio is installed in the cluster with sidecar injection enabled for your namespace, Istio will add a second proxy container (istio-proxy) for traffic management and observability. This results in two proxies in the request path: traffic flows throughistio-proxyfirst, thenqueue-proxy, before reaching your application. Both proxies serve important functions and work together without conflict.