Scaling a Real-Time Trading Platform with Elixir and OTP
Some of the most demanding work I've done is the kind I can't show you the code for — it lives in private client repos under NDA. But the architecture is mine to talk about, and the decisions behind it say more about how I work than any code dump would. This is one of those systems: the real-time backbone of a crypto and forex trading platform.
On a trading platform, "real-time" is literal. A position value that's a few seconds stale, or a liquidation-risk alert that arrives late, is a real loss for the user. That single constraint shaped every decision below.
The problem
Thousands of trading accounts, each with its own live connection to the exchange, all needing sub-second updates of positions, PnL, and order books fanned out to web and mobile clients. Three things made it hard:
- Isolation. One account's flaky exchange stream must never degrade anyone else's experience.
- Throughput. Hundreds of thousands of market and account events per second, recomputed into risk and pushed to clients continuously.
- Hard external limits. The exchange enforces weight-based API rate limits; blow through them during volatility and you get throttled exactly when users need the data most.
The shape of the system
The backend is an Elixir umbrella. Three apps carry the real-time load: one ingests and maintains live account state from the exchange, one is the API gateway that fans data out to clients over WebSockets, and one watches the market and fires alerts. They talk to each other over a clustered Phoenix.PubSub topic, not HTTP.
Why Elixir and OTP
This is the problem the BEAM was built for: millions of cheap, isolated processes, supervision trees that heal themselves, and soft-real-time scheduling. Instead of fighting a runtime to get concurrency and fault tolerance, I got to lean on them. The architecture below is really just OTP primitives applied with intent.
One supervision tree per account
The core decision: every trading account runs as its own isolated OTP process tree under a dynamic supervisor. A single AccountManager fetches the account list, manages each account's exchange listen key (the token that authorizes its private data stream), keeps those keys alive, and monitors every subtree. For each account it starts a small supervisor with three children.
If an account's connection drops or a process crashes, the supervisor restarts just that subtree — everyone else is untouched. That's the difference between "a bug took down one user's stream for two seconds" and "a bug took down the platform." At thousands of concurrent accounts, that isolation is the whole ball game.
Inside one account's tree
Each account's subtree is three processes with one job each:
- A WebSocket client holds the account's private user-data stream from the exchange and forwards raw events — order fills, balance and position updates — to the handler.
- An event handler (GenServer) owns the account's hot state in three ETS tables — positions, balances, and position risk. It writes them on every incoming event, and a 1s timer reconciles against the REST API as a safety net, with retry and consecutive-failure tracking so a transient exchange hiccup doesn't corrupt state.
- A risk broadcaster (GenServer) wakes on a 1s tick, reads the position-risk table, recomputes PnL and liquidation risk against the latest mark price, and pushes the result to clients.
ETS is the quiet hero here. Position and risk data is read and rewritten constantly; putting it behind a single process would make that process a bottleneck. Instead each account's tables are created with read and write concurrency enabled, so the handler's writes and the broadcaster's reads never block each other — lock-free shared state that's awkward elsewhere but native on the BEAM.
Share the public streams, isolate the private ones
Private user-data streams are per-account by necessity — each needs its own authenticated connection. But public market data (mark price, tickers) is the same for everyone, so opening one socket per account would be thousands of redundant connections and a fast trip to the exchange's connection limits. Public data instead flows through a small number of shared sockets that feed every account's risk calculation. Per-account where it must be, shared where it can be.
Fanning out to clients
The gateway exposes the data over Phoenix Channels — around a dozen topic-specific channels (order book, mark price, orders, notifications, charts, and more). Clients subscribe to exactly the streams they care about, so a user watching one symbol isn't paying for every other symbol's traffic. The state engine broadcasts updates onto the clustered PubSub topic; the gateway relays them to subscribed sockets.
Alerts: a tree of GenServers, not a pipeline
The alert engine is deliberately simple: a set of per-type GenServer notifiers — PnL, price levels, funding rate, funding interval, category changes — each subscribing to the relevant PubSub topic or market socket and evaluating its condition in handle_info as data arrives. No stream-processing framework in the matching path; just supervised processes holding the alarm rules and reacting to events. It's easy to reason about and trivially parallel — one process per concern.
The one place I reach for Broadway is the last mile: delivering push notifications. When alarms fire, the notifiers hand batches to a small Broadway pipeline (fed by a custom GenStage producer) that controls concurrency and batches the actual FCM sends. That's the right tool for fan-out delivery to an external API — but the decision to keep it out of the evaluation path is what keeps the engine debuggable.
Weight-aware rate limiting
To stay under the exchange's API limits, a manager distributes outbound REST calls across a pool of HTTP connections by request weight, not just request count — because the exchange charges different endpoints differently. During volatile periods, when call volume is highest, this is what keeps the platform from throttling itself into a brownout exactly when users need fresh data.
A real-world wrinkle
Architecture diagrams are tidy; exchanges are not. At one point the exchange moved its private user-data streams to a new routed endpoint and decommissioned the old URL — and the failure mode was nasty: an outdated connection still connected successfully but silently received no data. No error, no crash, just stale positions. The only reason it was a quick fix rather than a mystery outage was telemetry: the "events per second" metric for those sockets flatlined while the connections looked healthy. You catch the silent failures by measuring throughput, not just liveness.
Clustered and observable
The system runs across multiple nodes clustered with libcluster, with PubSub spanning the cluster so any node can serve any client. Every layer emits telemetry to Prometheus — process counts, event throughput, delivery latency, socket health. When you're promising sub-second delivery, you don't get to guess whether you're hitting it; you measure it.
What I take away
The headline numbers — sustaining 200k+ concurrent real-time events with reliable sub-second delivery — matter less than the shape of the decisions that got there: isolate aggressively, keep hot state lock-free, share what's public, and measure what you promise. None of it is exotic; it's OTP used the way OTP wants to be used. That's usually what "scalable" really means — not a clever trick, but a handful of boring decisions made consistently and a discipline about which tool belongs where.
I can't open the repo, but I'm always happy to walk through the architecture in more depth on a call.