BLOG POST

Real-Time Messaging with ZeroMQ: Patterns That Actually Work

December 22, 201530 MIN READ

People keep asking me why I chose ZeroMQ instead of a "proper" message queue. The answer is straightforward: I needed to push events from backend services through a processing pipeline into Elasticsearch for indexing and simultaneously fan them out via Socket.IO to connected browser clients - all in real time, all under ten milliseconds end-to-end. A broker sitting in the middle of that path, adding milliseconds per hop and consuming memory just to exist, was not an option. The budget was three t2.medium EC2 instances. There was no room for an Erlang runtime eating 800 MB of RAM to babysit queues I didn't need.

What follows is everything I've learned - the patterns that work, the patterns that don't, the production gotchas that cost me weekends, and enough code and architectural detail that you could rebuild the whole thing from this post. This is the guide I wish existed when I started.

Grab coffee. This is going to be long.

The system we're building

Before diving into ZeroMQ specifics, let me show you what we're actually building. This is a real-time notification and event processing system for a platform with a few thousand concurrent users. The requirements:

Backend services (Node.js) generate events - user actions, system alerts, status changes, transactional notifications
Events need to be routed through a processing pipeline - enrichment, deduplication, transformation, filtering
Processed events get indexed into Elasticsearch for search, aggregation, and historical queries
Simultaneously, relevant events get pushed to connected browser clients via Socket.IO in real time - sub-second from event generation to browser notification
The system must handle bursty traffic - quiet periods punctuated by spikes when something happens across the platform
The whole thing runs on three AWS t2.medium instances because that's the budget

Here's the high-level view:

Events flow top to bottom: API servers generate them, the processing tier transforms them, the delivery tier pushes them to Elasticsearch and Socket.IO simultaneously. ZeroMQ is the nervous system connecting all of it. No broker in sight.

Why ZeroMQ and not a broker

I evaluated RabbitMQ, Redis Pub/Sub, and a couple of others before choosing ZeroMQ. Here's the honest assessment:

RabbitMQ is excellent software. I have nothing against it. But it needs the Erlang runtime, it needs at least 512 MB–1 GB of memory just for the broker process, and adding it means another piece of infrastructure to deploy, monitor, and maintain on EC2. On a t2.medium (4 GB RAM), the broker alone eats a quarter of my memory budget before my application even starts. When you're running Node.js processes alongside Elasticsearch on the same instance (yes, I know, I'll explain), every megabyte counts.

Redis Pub/Sub was tempting because we already had a Redis instance for caching. But Redis Pub/Sub has no persistence, no acknowledgment, no message buffering - if a subscriber is offline when a message arrives, it's gone. That's also true of ZeroMQ's PUB/SUB, but ZeroMQ gives me PUSH/PULL for the pipeline stages where I need reliable delivery to at least one worker, and its performance overhead is essentially zero. In a C++ benchmark comparison, ZeroMQ PUB/SUB delivered roughly 410,000 msgs/sec versus Redis Pub/Sub at 59,000 - a 7× advantage. At the scale I'm working at the difference doesn't matter much, but the architectural flexibility does.

Kafka would be the right choice if I needed a distributed commit log with replay capability. I don't. My events are ephemeral notifications, not a permanent event stream. Kafka also wants ZooKeeper, which wants its own JVM, which wants its own memory, which wants its own EC2 instance. The budget for this project is three instances, not six.

So: ZeroMQ. No broker process. No runtime dependency. No daemon to manage. I npm install zmq in my Node.js services, create sockets, connect them, and start sending messages. The library handles reconnection, buffering, routing, and I/O threading internally. Total additional memory overhead: roughly 3–5 MB per process.

ZeroMQ is not a message queue

Despite having "MQ" in its name, ZeroMQ is not a message queue. It's a messaging library - a C library (libzmq) that you link against, which gives your application superpowers for moving discrete messages between processes and machines. There's no server to install. No daemon to configure. No broker process sitting between your services.

The "zero" originally meant "zero broker" and "zero latency." Pieter Hintjens, ZeroMQ's creator and the author of the indispensable ZeroMQ Guide (read it, all of it, cover to cover - it's the best technical writing I've encountered in years), expanded the meaning to include zero administration, zero cost, and zero waste.

His philosophy, and I'm paraphrasing from the zguide here: traditional message brokers are greedy central intermediaries that become too complex, too stateful, and eventually a problem. ZeroMQ inverts this - smart endpoints, dumb network. The intelligence lives at the edges, not in a central server. You choose the topology. Nothing is imposed on you.

The current stable release is libzmq 4.1.x (I'm running 4.1.4 on all instances). Version 4.0 introduced CURVE security - end-to-end encryption built on Daniel Bernstein's Curve25519 elliptic curve cryptography, baked right into the wire protocol (ZMTP 3.0). No TLS wrapper, no stunnel, no nginx reverse proxy. This matters when you're sending events between EC2 instances across a VPC - even within AWS, encrypt your inter-service traffic.

The mental model shift: these aren't TCP sockets

This took me the longest to internalize and it will save you weeks of confusion: ZeroMQ sockets are fundamentally different from TCP sockets.

TCP sockets give you byte streams. You push bytes in one end, pull bytes out the other, and you're responsible for framing, delimiting, parsing partial reads, managing reconnection, and handling every edge case yourself. Every Node.js developer has written the data += chunk; if (data.indexOf('\n') !== -1) { ... } dance at least once.

ZeroMQ sockets give you discrete, framed messages. You send a message object, the other side receives that exact message - complete, framed, no partial reads, no delimiter parsing. A single ZeroMQ socket can be connected to multiple peers simultaneously. Reconnection is automatic. Routing (round-robin, fair-queuing, pub/sub filtering) is built into the socket type. The ZMTP wire protocol handles framing at the transport layer.

In Node.js terms: where you'd normally build a TCP server with net.createServer(), handle the 'data' events, buffer partial messages, parse your framing protocol, manage a connection table, implement reconnection with backoff - ZeroMQ does all of that internally. Your code deals with complete messages and nothing else.

The other critical mental shift: bind vs. connect is about stability, not client/server roles. In traditional networking, the server binds and the client connects. In ZeroMQ, the stable node binds and the dynamic nodes connect. A worker that spins up and down connects to the long-lived service that binds. This means subscribers can bind and publishers can connect - the reverse of what you'd expect - and it works perfectly. On EC2, my processing proxy binds on fixed ports; the API servers and delivery services connect to it. When I scale up API server instances behind an ELB, the new instances just connect.

The six socket patterns and when to use each one

ZeroMQ's power comes from a small set of socket types implementing distinct messaging patterns. Each enforces specific send/receive behavior. Choosing the right pattern is the most important architectural decision you'll make.

Here's the cheat sheet I keep taped to my monitor (literally, printed out, stuck to the bezel with masking tape):

Pattern	Socket Types	Direction	At HWM
REQ/REP	REQ ↔ REP	Synchronous RPC	Block
PUB/SUB	PUB → SUB	One-to-many broadcast	Drop (PUB)
PUSH/PULL	PUSH → PULL	One-to-one distribution	Block (PUSH)
DEALER/ROUTER	DEALER ↔ ROUTER	Async request-reply	Mixed*
PAIR	PAIR ↔ PAIR	Exclusive bidirectional	Block
XPUB/XSUB	XPUB → XSUB	Proxy-aware pub/sub	Drop (XPUB)

* DEALER blocks at HWM; ROUTER drops. Know this distinction. It will matter at 3 AM.

REQ/REP - synchronous RPC and its fatal flaw

The simplest pattern. REQ sends a message, blocks until it gets a reply. REP waits for a request, processes it, sends back a response. The lock-step send→recv→send→recv cycle is enforced by the internal state machine - try to send twice without receiving and you get an EFSM error.

Under the hood, REQ prepends an empty delimiter frame (the "envelope") that enables routing through intermediaries. REP strips this on receive, saves it, and re-wraps the reply. When multiple REP servers connect, REQ round-robins across them; REP fair-queues from all clients.

In Node.js with the zmq binding:

server.js

// server.js - REP
var zmq = require('zmq');
var responder = zmq.socket('rep');

responder.bind('tcp://*:5555', function(err) {
  if (err) throw err;
  console.log('REP server listening on 5555');
});

responder.on('message', function(msg) {
  console.log('Received: ' + msg.toString());
  setTimeout(function() {
    responder.send('World');
  }, 100); // Simulate async work
});

client.js

// client.js - REQ
var zmq = require('zmq');
var requester = zmq.socket('req');

requester.connect('tcp://localhost:5555');

var count = 0;
requester.on('message', function(msg) {
  console.log('Got reply ' + count + ': ' + msg.toString());
  count++;
  if (count < 10) {
    requester.send('Hello');
  } else {
    requester.close();
  }
});

requester.send('Hello');

Clean. Simple. And fatally flawed for production use: if the server dies mid-request, the REQ client hangs forever. There's no built-in timeout. The internal state machine is stuck waiting for a reply that will never come, and there is no way to reset it without destroying the socket entirely. In Node.js this is especially nasty because the hung socket silently blocks that communication channel - no error event, no timeout callback, nothing.

I don't use raw REQ/REP in production. Ever. You need the Lazy Pirate pattern (covered below) or DEALER/ROUTER. More on that later.

Behavior	Socket Types	What Happens
DROP	PUB, ROUTER, XPUB	Messages silently disappear. Zero notification.
BLOCK	PUSH, DEALER, REQ, PAIR	`zmq_send()` blocks until the queue drains. In Node.js, BLOCK means the send call blocks the event loop. Bad.

Transport	Typical Latency	Notes
`inproc://`	< 1 μs	Memory copy
`ipc://`	~10–15 μs	Unix domain socket
`tcp://` (localhost)	~25–30 μs	Loopback TCP
`tcp://` (same AZ)	~200–500 μs	Network + TCP
`tcp://` (cross-AZ)	~500–2000 μs	Depends on AZ pair

Type	Protocol	Port Range	Source
Custom TCP	TCP	5555-5565	sg-zmq-internal
Custom TCP	TCP	6379	sg-zmq-internal (Redis)
Custom TCP	TCP	9200	sg-zmq-internal (ES)
Custom TCP	TCP	9300	sg-zmq-internal (ES)

Type	Protocol	Port Range	Source
HTTPS	TCP	443	0.0.0.0/0 (ELB)
Custom TCP	TCP	3000	ELB security group

Percentile	Latency
p50 (median)	4.2 ms
p95	8.7 ms
p99	14.3 ms
p99.9 (worst during load spikes)	42 ms

Metric	Value
Sustained event rate (typical)	2K–5K/sec
Peak burst rate (handled without loss)	~20K/sec
ES bulk indexing rate	8K/sec
Socket.IO concurrent connections	~2,000
PUB/SUB message fan-out per event	2–3 subs

Resource	Usage
Total RAM used (all 3 instances)	~6.5 GB
ZeroMQ memory overhead (all processes)	~25 MB
CPU utilization (typical)	15–30%
CPU utilization (peak bursts)	60–75%
Network bandwidth (inter-instance)	2–10 Mbps

	Brokerless Libraries	Traditional Brokers	Log-based Systems
Examples	ZeroMQ, nanomsg	RabbitMQ, ActiveMQ	Kafka
Throughput	Millions/sec	10K–50K/sec	100K+/sec
Latency	μs range	ms range	ms range
Persistence	None	Yes	Yes (disk)
Guaranteed delivery	Build it yourself	Built-in (ACKs)	Built-in (offsets)
Infra overhead	Zero (library)	Broker + deps	ZK + brokers
Memory cost	3–5 MB	500 MB–1 GB	1+ GB (JVM)
Best for	Real-time, low-latency, tight budget	Complex routing, guaranteed delivery	Event sourcing, stream replay, big data

Port	Protocol	Instance
5555	REQ/REP health check	All
5557	PUSH/PULL event pipeline	Instance 2
5558	PUSH/PULL worker output	Instance 2
5559	XSUB proxy (pub input)	Instance 2
5560	XPUB proxy (sub output)	Instance 2
5561	PUB processed events	Instance 2
3000	Express API (HTTP)	Instance 1
3001	Socket.IO (WS)	Instance 3
9200	Elasticsearch (HTTP)	Instance 3
6379	Redis	Instance 2

If you're facing similar challenges, let's talk.

Bring the current architecture context and delivery constraints, and we can map out a focused next step.

Book a Discovery Call

Newsletter

Stay connected

Not ready for a call? Get the next post directly.

From \ To	REQ	REP	PUB	SUB	DEALER	ROUTER	PULL
REQ		✓				✓
REP	✓				✓
PUB				✓
SUB			✓
DEALER		✓			✓	✓
ROUTER	✓				✓	✓
PUSH							✓
PULL						✓

The system we're building

Why ZeroMQ and not a broker

ZeroMQ is not a message queue

The mental model shift: these aren't TCP sockets

The six socket patterns and when to use each one

REQ/REP - synchronous RPC and its fatal flaw

PUB/SUB - broadcast with topic filtering

PUSH/PULL - parallel work distribution

DEALER/ROUTER - the async backbone

PAIR - thread signaling, nothing more

XPUB/XSUB - the proxy you actually need

How the pieces fit together: our architecture

Architecture and internals that matter

The context model

Message framing and multipart messages

High-water marks: the silent killer

Transport protocols: tcp, ipc, inproc, pgm

Reliability patterns: the stuff that keeps you employed

Lazy Pirate: client-side retry

Paranoid Pirate: heartbeating that works

Majordomo: service-oriented routing

Clone: distributed state replication

Deploying on AWS EC2 - the full walkthrough

Instance sizing and layout

Building ZeroMQ on Ubuntu 14.04

Node.js and the zmq binding

Security groups and port management

Process management with upstart and systemd

Monitoring ZeroMQ in production

Wiring Socket.IO to the ZeroMQ backbone

Feeding Elasticsearch from the pipeline

Performance numbers from our actual deployment

Who else is running ZeroMQ in production

Seven things that will bite you

1. The slow subscriber eats your memory then drops your messages

2. Late joiners miss the first messages - always

3. Socket sharing across processes = silent corruption

4. The LINGER default hangs your shutdown

5. Message loss is a feature, not a bug

6. Enable ROUTER_MANDATORY immediately

7. Node.js PUSH blocking the event loop

ZeroMQ vs. the alternatives: when to use what

Architecture diagrams and socket compatibility

Socket compatibility matrix

Port allocation map

Final thoughts

Stay connected