A message broker from FIFOs and awk
KMQ is a homelab project with a deliberately narrow premise: build a message broker out of named pipes, awk and Kubernetes primitives. No Kafka. No NATS. No RabbitMQ. FIFOs and a few hundred lines of awk wired together by container manifests.
The point is not production. The point is implementing the patterns by hand, in a setting where the cost of a wrong abstraction is a few seconds of debugging and the reward is a working artifact instead of a cited definition.
This post documents what KMQ does, what KMQ does not do, and four scenarios run against it. Two passed cleanly. Two failed in unanticipated ways. The failures carry most of the diagnostic weight.
Why a broker, by hand
Event-driven systems have a mature vocabulary. Backpressure, ordering guarantees, durability tiers, single semantic authority, append-only logs with offset replay. The terms are everywhere. Pub/Sub flow control, Kafka consumer groups, AMQP exchange bindings, SQS DLQ redrive: every serious message broker has documentation for each, and a long history of decisions encoded in their implementations.
Using those terms without ever implementing one of them from primitives leaves a gap. KMQ is the attempt to close it.
The bet: a broker built from Unix primitives exposes the patterns more honestly than any production system. A FIFO is not a metaphor for backpressure. A FIFO is the kernel-level mechanism backpressure is built on. An append-only log is not a buzzword. An append-only log is a file that responds to tail. A dead letter queue is not a distinguished concept in the protocol. A dead letter queue is a destination file in a routing table written in awk.
If those primitives behave correctly under load and under failure, the patterns become visible from the bottom up. If they fail, the failures are specific and inspectable.
The shape of KMQ
KMQ runs as a single Kubernetes Deployment with four containers in one Pod, plus stateless ingress and egress Deployments in front.
The pipeline inside the broker pod is four awk processes connected by three named pipes:
Producer
|
| TCP :30672 (NodePort)
v
+-----------------------------+
| ingress Deployment (n=2) | stateless
| socat :5672 to broker:5673 | (replicable)
+-----------------------------+
|
| TCP :5673 (ClusterIP)
v
+==========================================================+
| broker Pod (replicas=1, single semantic authority) |
| |
| init: mkfifo /pipes/{raw, framed, durable} |
| |
| +-------------+ raw +--------+ framed +--------+ |
| | ingress.awk |--------->| framer |--------->| dur. | |
| | (socat fork)| FIFO | .awk | FIFO | .awk | |
| +-------------+ +--------+ +---+----+ |
| seq + ts | |
| resume reading | tee |
| append.log on +------+ |
| startup | |
| v |
| +---------+ |
| |append.log| |
| |(WAL) | |
| +---------+ |
| |
| durable FIFO |
| | |
| v |
| +--------+ |
| | router | |
| | .awk | |
| +---+----+ |
| | |
| v |
| +------------------------------------------------------+|
| | /logs (hostPath, persistent, RO mount for metrics) ||
| | test.log ||
| | jobs.process.log ||
| | jobs.notify.log ||
| | dead_letter.log (unmatched routing keys) ||
| | append.log (full audit, source of truth) ||
| +------------------------------------------------------+|
| ^ |
| | RO |
| +-----------------------+----------------------------+ |
| | metrics sidecar (Prometheus :9090, observer) | |
| +----------------------------------------------------+ |
+==========================================================+
|
| tail /logs/{queue}.log via shared volume
v
+-----------------------------+
| egress Deployment (n=2) | stateless
| socat /logs/X.log to client |
+-----------------------------+
|
| TCP :30674 (NodePort)
v
Consumer
Each container does one thing.
The framer assigns a monotonically increasing sequence number to each message, prepends a millisecond timestamp, and writes to the next pipe. On startup, the framer reads the persistent append log and resumes numbering from the last seq found. This is the offset-replay primitive. Without it, a pod restart resets seq to 1, and downstream consumers cannot tell the new message 1 from the original message 1.
The durability stage forks the stream: appends every record to append.log on a hostPath volume and forwards the same record downstream. Four persistence tiers exist (none, batched, per-message, per-message+sync). Tier 1 batches every 100 messages and is the default.
The router matches each record’s routing key against a small set of prefixes (test.*, jobs.process.*, jobs.notify.*) and writes the record to the corresponding queue log. Anything unmatched goes to dead_letter.log. This is the broker’s exchange-to-queue binding, written as five lines of awk.
Ingress and egress deployments are stateless socat processes: 5672 to broker-svc:5673 inbound, 5674 to consumer connection outbound. No state to lose, so each replicates to two pods.
Total: about 200 lines of awk, eight YAML manifests, three container images. The only state surviving a pod restart lives in per-queue log files on disk. Everything else is ephemeral by design.
Four scenarios, four named patterns
Four scripts, one per scenario. Each produces a logfmt artifact in a results/ directory. Each tests a pattern with a name in the literature and an implementation here.
Artifacts are versioned by UTC timestamp. The four runs feeding this writeup happened between 09:42 and 10:13 UTC on a Sunday morning.
Scenario 1: sequence resume across broker restart
Pattern: event sourcing offset replay. Kafka consumers, Pub/Sub ordering guarantees, write-ahead-log recovery in databases. The append log is the durable source of truth. The processes around it are stateless executors.
Test: send 500 messages. Wait for append.log to settle at 500 records. Delete the broker pod. Wait for Kubernetes to schedule a replacement under the Recreate strategy. Send 500 more. Verify the resulting log has 1000 records with seq monotonically increasing and zero gaps.
Result: clean. The framer’s stderr line framer: resuming from seq 500 appears in the artifact. The five-record window straddling the kill boundary shows seq 499 and 500 with payload batch1, then seq 501, 502, 503 with payload batch2. The gap checker reads 1000 records, first=1, last=1000, PASS.
awk
499|1777889157430|test.probe|1777889154142|499|batch1
500|1777889157430|test.probe|1777889154143|500|batch1
501|1777889168897|test.probe|1777889168897|1|batch2
502|1777889168898|test.probe|1777889168898|2|batch2
503|1777889168899|test.probe|1777889168899|3|batch2
The framer-side timestamp jumps by eleven seconds across the boundary, which matches the time the broker pod took to be deleted, rescheduled and become Ready. Sequence numbers do not jump. The new framer instance reads the existing append.log, walks line by line, finds seq=500, continues from 501.
The takeaway: Kafka’s offset replay performs the same operation as the framer, with substantially more code and substantially more correctness guarantees. Reading the log at startup, the seq increment, the persistence: those operations exist in any system claiming at-least-once delivery with order. KMQ has them in 25 lines of awk.
Scenario 2: routing and dead letter cardinality
Pattern: AMQP-style exchange-to-queue binding. SNS topic-to-subscription routing. Pub/Sub pull subscriptions filtered by attribute. The router is a function from message to destination, with a default catching everything else.
Test: build a stream of 100 messages mixing five categories. Forty with test.probe, twenty with jobs.process.user_*, twenty with jobs.notify.email_*, ten with random.unknown_*, ten with bar.baz_*. Shuffle so routing is per-record, not per-batch. Send. Verify exact cardinality per destination file.
Result: exact. 40 in test.log, 20 in jobs.process.log, 20 in jobs.notify.log, 20 in dead_letter.log. The DLQ sample shows three lines from unmatched categories preserved with original routing keys intact.
logfmt
step=measure file=test.log observed=40 expected=40
step=measure file=jobs.process.log observed=20 expected=20
step=measure file=jobs.notify.log observed=20 expected=20
step=measure file=dead_letter.log observed=20 expected=20
The takeaway: routing is the cheap part of a broker. Twenty lines of awk. The cost lives elsewhere: persistence, ordering, delivery guarantees, ack semantics. Routing always works. The hard part is what happens after the routing decision.
Scenario 3: backpressure boundary measurement
First scenario where a clean pass was expected and a measurement appeared instead.
Pattern: kernel-enforced flow control via pipe(7) blocking write semantics. When a downstream reader stops consuming, the kernel pipe buffer fills, writers block in a write() syscall until space opens. No userspace memory growth. No application-level protocol. The kernel handles the bookkeeping.
Test: pause the router with kill -STOP against its gawk PID inside the container’s PID namespace. Burst 20 000 messages from a TCP client. Measure how many reach append.log while the consumer is paused. Read the broker pod’s cgroup memory.current to verify pipe buffering lives in kernel space, not in process address space. Resume the router. Wait for the pipeline to drain. Count survivors.
Expected: the sender blocks at some point. Memory does not grow. After resume, all 20 000 messages drain to append.log.
Observed:
logfmt
step=verdict
sent=20000
delivered=9142
lost=10858 loss_pct=54
delivered_seq_integrity=PASS
rss_pre_mb=2 rss_paused_mb=9 rss_during_mb=5 rss_post_mb=7
The sender did not block. socat completed the burst in five seconds and exited cleanly. Memory stayed below 10 MB throughout, which confirms whatever buffering happened was kernel-side. More than half the messages were lost. What survived was perfectly ordered with zero gaps.
That result shape revealed a missing piece in the model. KMQ has kernel-enforced backpressure inside the pipeline. The FIFOs do block. The gawk processes do wait. The buffering does happen at the kernel level. What KMQ does not have is a backpressure protocol at the TCP boundary. The internal-ingress is a socat process forking a gawk per connection. When the FIFO fills, the gawk blocks, but socat does not propagate that block back to the TCP sender via zero-window or RST in any way preventing data loss. Kernel TCP buffers absorb what they can. The rest is dropped on the floor.
The internal pipeline is honest. The boundary at the wire is not.
This distinction is exactly what protocol-level flow control addresses. gRPC stream window updates. HTTP/2 connection-level flow control. Kafka’s produce request acknowledgement loop. The Pub/Sub client library’s flow_control option. Each of those exists because TCP backpressure alone is not sufficient when the application layer wants a delivery guarantee.
KMQ phase 1 does not make that guarantee. KMQ phase 1 tries, gets close inside the pipeline, and falls over at the boundary. Measuring the loss rate makes the boundary visible. The number, 54% lost under a paused-consumer burst, is the size of the gap between “blocking writes inside the system” and “delivery semantics at the protocol”.
Scenario 4: dead letter replay
Pattern: SQS DLQ redrive. Kafka Streams DLQ recovery. Azure Service Bus dead-lettered handling. The poison message workflow. A dead letter queue is not a graveyard. A dead letter queue is a buffer of messages with indeterminate destination, plus a recovery operation reading the DLQ, repairing the routing key per a remediation policy, and reinjecting through the normal ingress.
Test: send 70 messages with known routing keys and 30 with unknown.thing_* keys routing to dead_letter.log. Verify routing cardinality. Snapshot the dead_letter.log content with sha256. Read it line by line, rewrite each unknown.thing_N key to test.recovered_N, and reinject through the ingress. Wait. Verify recovered records show up in test.log, that dead_letter.log on disk is unchanged (audit trail preserved), and that the full append.log shows continuous sequence numbering across the original send and the recovery.
Phase 1 of the test passed. The original 100 messages routed correctly. test.log = 70, dead_letter.log = 30, append.log = 100, sequence integrity PASS.
Phase 2 failed. The 30 reinjected messages did not arrive.
logfmt
step=verdict
test_log_growth=0
expected_growth=30
dead_letter_audit_preserved=true
sequence_integrity=PASS
final_records=100
expected_final=130
socat reported the messages were sent. The TCP write completed. The records never appeared in append.log. The records never reached durability.
First hypothesis: a buffering bug in the durability stage. KMQ’s tier=1 persistence batches close-and-reopen of the log file every 100 records. Phase 1 had hit that boundary exactly, which means the file got closed and flushed. Phase 2’s 30 messages would not reach the boundary, leaving them in the gawk stdio buffer indefinitely. Adding fflush(logfile) per record, rebuilding the broker image, pushing to the registry, force-pulling, restarting the broker, retrying.
Same failure. Same shape. 30 messages lost.
Second hypothesis: a race in the container lifecycle. The framer container’s command is gawk -f /svc/framer.awk /pipes/raw. gawk reads the FIFO as an input file. When the last writer of the FIFO closes (the ingress.awk gawk handling phase 1’s connection terminating with the TCP close), framer reads EOF and exits. kubelet observes the container terminated with exit 0 and restarts it. The same applies to durability and router downstream.
kubectl describe pod showed it directly:
shell
Restart Count: 0 # init-pipes
Restart Count: 0 # internal-ingress
Restart Count: 1 # framer
Restart Count: 1 # durability
Restart Count: 1 # router
A cascading restart of the three awk-based containers happened between phase 1 and phase 2. Phase 2’s TCP connection landed in the restart window. The new ingress.awk process forked by socat tried to open /pipes/raw for write. No reader on the other side. The open blocked. socat client-side timed out at 500 ms (the closewait default), tore down the connection, and the in-flight 30 messages went with it.
The 504 ms duration_ms field in the artifact was the smoking gun, in retrospect. Not the time to send 30 messages. The time socat client took to give up.
A different failure than scenario 3, pointing to the same architectural gap. KMQ phase 1 treats each TCP connection as an independent session with no continuity guarantee. Scenario 3 demonstrated loss at the ingress under load. Scenario 4 demonstrated loss at the ingress under restarts. Two different load profiles, same boundary, same gap.
The fix is the same in both cases. A broker wanting delivery semantics needs producer-side acks with retry from the last persisted offset. The producer holds outstanding messages until an ack arrives. The broker assigns a seq, persists, acks. On restart or boundary loss, the producer retries from the last unacked seq. This is the at-least-once delivery contract Kafka, Pub/Sub and AMQP all implement.
KMQ phase 1 has the substrate (the append log, the seq numbering, the persistent file). KMQ phase 1 does not have the protocol. Phase 2 is where that protocol lives.
What the scenarios show, taken together
The two passes are pleasant. Sequence resume works because the append log is content-addressable by seq, and reading it on startup is mechanical. Routing works because awk’s ~ operator and a five-line if/elseif ladder are exactly the right primitive for prefix dispatch.
The two failures are the structural part of this writeup. Both hit the same boundary from different angles. The first under load: a paused consumer plus a 20 000-message burst exposed that the ingress does not propagate FIFO blocks back to TCP senders. The second under restarts: a container lifecycle event in the awk pipeline opened a 500 ms window where new connections cannot complete, and any in-flight messages in that window are lost.
Two paths to the same gap is more useful than one. The boundary is not a coincidence of one specific test setup. The boundary is the architecture. The fix is not a tweak to ingress.awk or a buffer size adjustment. The fix is a delivery protocol with acks.
This is the point where vague analogies stop being good enough. Reciting the at-least-once guarantee from documentation is not equivalent to reproducing its absence in a working system, twice, under measurably different conditions, with the size of the gap stated in concrete numbers (54% loss under burst, 100% loss under restart race).
What this is, and what this is not
KMQ is not a production message broker and will not become one. Real production brokers exist, written by people who have spent years inside the consequences of these decisions. KMQ is not competing with them.
KMQ is an instrument. The kind of system built to expose what an understanding of a topic is actually made of. The patterns in event-driven systems are presented in documentation and books as nouns: backpressure, durability, ordering, delivery semantics. Building one of these systems from primitives turns them into verbs. Backpressure is what kernel pipe writes do. Durability is what print >> logfile; close(logfile) does on a hostPath volume. Ordering is what a single-writer append log produces by construction. Delivery semantics is what does not come for free.
The artifacts on disk are not benchmarks. Not throughput numbers. Four files in a results directory, two PASS, two FAIL, all four reproducible from a clean cluster in about thirty minutes. The failures arrive with measured boundaries and a clear architectural diagnosis.
A working production broker handles every one of these patterns and many more. KMQ does not compete with that. KMQ is the version where the simple primitive’s failure point is exact and visible.
What is queued
A phase 2 with producer acks and replay-from-offset semantics. The substrate exists. The protocol does not.
A daemon-shaped pipeline where the awk processes do not exit on EOF. Either by wrapping each in a while true; do gawk ...; done loop in the container command, or by restructuring the FIFOs so no close cascades through the chain. Both are small changes. Both eliminate the 500 ms restart race in scenario 4.
A backpressure protocol at the TCP boundary. socat-fork-exec is the wrong primitive. A custom ingress holding the connection while the FIFO blocks, signaling the client with TCP zero-window or with an application-level NACK, would close the gap measured in scenario 3.
None of those are weekend projects. Phase 1 was. Phase 1 produced four data points and a diagnosis of where the system stops being trustworthy.