From push-and-pray to complete feedback loop: dual sidecar ACK and TLS
Post 03 introduced an out-of-band ack-egress sidecar. It gave the producer a way to ask “has my message been safely persisted?” and turned the unreliable TCP boundary into an application-level at-least-once guarantee. Still missing was the consumer side. A producer that needed evidence that a message had been fully processed, not just stored, remained in the dark.
This post closes that final gap. It adds a second sidecar (ack-ingress) that answers a different question: “has my message been handled by a consumer?”. The two together form a complete pull-based feedback loop. Because every byte of the loop now carries sensitive application data, the post also describes how TLS was bolted onto all three broker TCP endpoints without modifying a single line of AWK in the hot path and without removing the plaintext listeners that make local debugging trivial.
Two questions, two sidecars
| Sidecar | Plaintext port | TLS port | Question | Source file |
|---|
ack-egress | 5675 | 54435 | Has message sequence N been persisted? | append.log |
ack-ingress | 5676 | 54436 | Has message sequence N been processed? | processed.log |
The first sidecar was described in the earlier post. The second is constructed identically. A socat listener forks an AWK script that scans a single log file in O(n) time using getline. No shell, no system(), read-only PVC mount.
The consumer (or a test harness) appends a line to processed.log after finishing its work. The same message line plus a processing timestamp. The ack-ingress script checks for the presence of a sequence number and returns PROCESSED or PENDING.
# ack-ingress.awk - processing-status query endpoint
BEGIN { LOG = ENVIRON["PROCESSED_LOG"] }
{
gsub(/^[[:space:]]+|[[:space:]]+$/, "")
if ($1 == "GET_STATUS" && NF >= 2) {
target_seq = $2
found = 0
while ((getline line < LOG) > 0) {
split(line, f, "|")
if (f[1] == target_seq) { found = 1; break }
}
close(LOG)
print (found ? "PROCESSED" : "PENDING")
fflush()
}
exit
}
With both endpoints available, a producer can answer the two essential questions without ever inspecting the broker’s internals. The responsibility for validating delivery status and deciding when to retry stays in the client. The broker remains an honest, inspectable storage pipeline.
The complete loop
producer broker pod consumer
| | |
| 1. send ---------->| internal-ingress |
| | | |
| | v |
| | pipeline -> append.log (PVC) |
| | |
| | egress -----------------------> reads message
| | | does work
| | | |
| | processed.log (PVC) <---------|<----+
| | ^ ^ |
| | | | reads only
| | | |
| 2. GET_ACK ------->| ack-egress
| <- ACK <seq> ------| (durability)
| |
| 3. GET_STATUS ---->| ack-ingress
| <- PROCESSED ------| (consumer processing)
| |
The pipeline writes append.log. The consumer writes processed.log. Both files live on the same PVC. Both ACK sidecars read those files only. The hot path knows nothing about delivery semantics.
Same broker, two transports
TLS is added as a parallel listener set, not a replacement. For every plaintext container a sibling exists with a different port and a static socat OPENSSL-LISTEN command. The choice between plain and encrypted is fixed at manifest-apply time, not at runtime. No shell-wrapper evaluation, no environment switch.
The private key and certificate come from a step-ca instance running on the worker. They are stored in a Kubernetes Secret named kmq-tls and mounted read-only at /etc/tls. The CA root certificate is distributed to clients, which use socat OPENSSL with the cafile option to verify the broker. Connecting by pod IP currently requires the verify=0 flag because the certificate’s SAN does not include dynamic pod IPs.
The plaintext endpoints stay fully operational. Existing scenarios continue to pass without modification. A developer can bypass TLS entirely during local debugging. When TLS is needed, clients connect to the encrypted port with cafile. The AWK scripts are unchanged. The pipeline is unchanged.
Validation status
The continuous validation suite (run-all-scenarios.sh) covers both the plaintext and the TLS path. A clean run on the homelab cluster:
ts=2026-05-09T21:04:00Z runner=all-scenarios lvl=INFO script=scenarios/scenario-resume.sh msg=start
ts=2026-05-09T21:04:34Z runner=all-scenarios lvl=PASS script=scenarios/scenario-resume.sh msg=exit 0
ts=2026-05-09T21:04:34Z runner=all-scenarios lvl=INFO script=scenarios/scenario-routing.sh msg=start
ts=2026-05-09T21:04:40Z runner=all-scenarios lvl=PASS script=scenarios/scenario-routing.sh msg=exit 0
ts=2026-05-09T21:04:40Z runner=all-scenarios lvl=INFO script=scenarios/scenario-dlq-replay.sh msg=start
ts=2026-05-09T21:04:45Z runner=all-scenarios lvl=PASS script=scenarios/scenario-dlq-replay.sh msg=exit 0
ts=2026-05-09T21:04:45Z runner=all-scenarios lvl=INFO script=scenarios/scenario-ack-delivery.sh msg=start
ts=2026-05-09T21:04:48Z runner=all-scenarios lvl=PASS script=scenarios/scenario-ack-delivery.sh msg=exit 0
ts=2026-05-09T21:04:48Z runner=all-scenarios lvl=INFO script=scenarios/scenario-e2e-ack.sh msg=start
ts=2026-05-09T21:04:57Z runner=all-scenarios lvl=PASS script=scenarios/scenario-e2e-ack.sh msg=exit 0
ts=2026-05-09T21:04:57Z runner=all-scenarios lvl=INFO script=scenarios/scenario-ack-retry.sh msg=start
ts=2026-05-09T21:05:28Z runner=all-scenarios lvl=PASS script=scenarios/scenario-ack-retry.sh msg=exit 0
ts=2026-05-09T21:05:28Z runner=all-scenarios lvl=INFO script=scenarios/scenario-backpressure.sh msg=start
ts=2026-05-09T21:05:51Z runner=all-scenarios lvl=PASS script=scenarios/scenario-backpressure.sh msg=exit 0
ts=2026-05-09T21:05:51Z runner=all-scenarios lvl=INFO script=scenarios/scenario-backpressure-block.sh msg=start
ts=2026-05-09T21:06:27Z runner=all-scenarios lvl=PASS script=scenarios/scenario-backpressure-block.sh msg=exit 0
ts=2026-05-09T21:06:27Z runner=all-scenarios lvl=INFO script=scenarios/scenario-tls-full.sh msg=start
ts=2026-05-09T21:06:38Z runner=all-scenarios lvl=PASS script=scenarios/scenario-tls-full.sh msg=exit 0
ts=2026-05-09T21:06:38Z runner=all-scenarios lvl=INFO script=scenarios/scenario-e2e-ack-tls.sh msg=start
ts=2026-05-09T21:06:52Z runner=all-scenarios lvl=PASS script=scenarios/scenario-e2e-ack-tls.sh msg=exit 0
ts=2026-05-09T21:06:52Z runner=all-scenarios lvl=SUMMARY script=all-scenarios msg=pass=10 fail=0
Ten scenarios. Zero failures. Total wall time 172 seconds.
The dual-ACK feedback loop described in this post is exercised by scenario-e2e-ack.sh (plaintext, 9 seconds) and scenario-e2e-ack-tls.sh (TLS, 14 seconds). Both send 50 messages, consume them through the egress service, write each consumed line to processed.log and poll ack-ingress until every sequence number returns PROCESSED. The TLS variant connects to ports 54433 and 54436 with socat OPENSSL and the CA file. scenario-tls-full.sh adds a durability-ACK step over port 54435 in the same encrypted style.
A reviewer who wants to run the suite without setting up step-ca can comment the last two run_scenario lines. The first eight scenarios run independently of TLS and continue to validate the full plaintext pipeline including durability, routing, dead-letter replay, backpressure block policy and ACK retry.
Throughput
A direct burst from WORKER_NODE to the broker pod, 1000 messages of approximately 30 bytes each, no consumer pressure, pipeline cold but warm enough that the ring buffers are populated:
ts=2026-05-09T21:08:51Z host=WORKER_NODE svc=kmq-throughput lvl=INFO phase=start msg_count=1000 broker=10.244.1.230:5673
ts=2026-05-09T21:08:51Z host=WORKER_NODE svc=kmq-throughput lvl=INFO broker_ready=true
ts=2026-05-09T21:08:51Z host=WORKER_NODE svc=kmq-throughput lvl=INFO queue_log_cleared
ts=2026-05-09T21:08:51Z host=WORKER_NODE svc=kmq-throughput lvl=INFO phase=drain_start
ts=2026-05-09T21:08:51Z host=WORKER_NODE svc=kmq-throughput lvl=INFO phase=ready_to_send
ts=2026-05-09T21:08:51Z host=WORKER_NODE svc=kmq-throughput lvl=INFO phase=send_start
ts=2026-05-09T21:08:52Z host=WORKER_NODE svc=kmq-throughput lvl=INFO phase=send_done send_elapsed_ms=1
ts=2026-05-09T21:08:52Z host=WORKER_NODE svc=kmq-throughput lvl=INFO phase=wait_flush
ts=2026-05-09T21:08:52Z host=WORKER_NODE svc=kmq-throughput lvl=INFO phase=flush_done arrived=1000 flush_ms=1 throughput_msg_s=500000
ts=2026-05-09T21:08:52Z host=WORKER_NODE svc=kmq-throughput lvl=INFO phase=end result=PASS throughput_msg_s=500000
500 000 messages per second computed by the script. Read this number with the conditions attached:
- Single-node deployment. Producer and broker pod are on the same worker (WORKER_NODE).
- Payload is one routing key plus one timestamp plus one sequence number plus the literal
msg. About 30 bytes per line. - The pipeline lives mostly in shared memory. Two ring buffers in
/dev/shm. One FIFO. One PVC for append.log and test.log. send_elapsed_ms=1 and flush_ms=1 are at the millisecond clock granularity. The reported throughput hits the upper bound imposed by the timer resolution at this batch size.- The
+1 in the script’s denominator is a divide-by-zero guard, which makes the number conservative when the elapsed time rounds to 1 ms.
What this measures honestly is that the pipeline is fast enough that a 1000-message burst at small payload size is below the resolution of the wall clock used for the test. What it does not measure is sustained throughput under continuous load with realistic payloads, cross-node traffic, or contention with consumers.
For a headline number on the niche this broker serves, 500 k msg/s on a 4-core mini PC at 1.5 GHz is honest. Anyone reproducing the run on the same hardware should see the same order of magnitude.
What was added across the series
The journey from “push and pray” to a fully accountable, encrypted message path took three additions on top of the original pipeline.
- A storage-level ACK that proves durability. Post 03.
- A processing-level ACK that proves the consumer did its work. This post.
- Transport encryption that keeps the whole exchange private. This post.
All three were plugged into the broker using the same sidecar pattern. No hot-path changes. No workflow interruptions. Zero-loss rollouts. The broker still knows nothing about delivery guarantees. It holds the evidence and serves queries. The guarantees are built on top, in the test harness and in the producer’s retry logic, exactly where they belong.
What remains
What the broker can prove today is bounded. The boundaries are visible.
- Durability is single-node.
append.log lives on one PVC on one worker. If that worker’s disk dies, history dies with it. An off-node backup or a replicated WAL is the next durability step. - Availability is single-replica. The broker pod is one replica with
strategy: Recreate. Loss of the worker means loss of the broker. For a lab that is acceptable. For anything else it is not. - No protocol-level backpressure at the TCP boundary. The block policy plus the ACK retry loop hide the gap, but the TCP ingress still drops messages under extreme load before block engages. A proper producer-side window or HTTP/2 flow control would close this permanently rather than mask it.
- No access control. Anyone reachable on the cluster network can produce messages or query ACKs. Mutual TLS for client identity, or an application-level shared secret on the ACK channels, is the next hardening step.
- Manual certificate rotation. step-ca issues 24-hour certificates. A scheduled job that requests a fresh certificate and updates the Secret before expiration is queued.
- Unbounded log growth. Both
append.log and processed.log grow forever. The CRD already has retention fields. The rotation implementation is not yet in tree. - Observability is
tail -f and wc -l. Honest but limited. A small Prometheus exporter reading ring-buffer depth and ACK latency would not require pipeline changes.
For the niche it serves, KMQ is a lab-grade broker with explicit reliability boundaries, a complete inspectable feedback loop and a transport encryption layer that does not interfere with any of the above. Each guarantee is backed by an executable scenario that exits 0 or fails loud. The series closes here. The gaps above are each their own scope.