Notebook 002 — The stream we thought we had

There are two Alpaca websocket surfaces we care about.

One is market data: trades, quotes, bars. That's the stream that tells us what the market is doing.

The other is trade updates: fills, partial fills, cancels, rejects. That's the stream that tells us what happened to our own orders.

They are both "websockets," but they are not the same dependency. One tells us about the world. The other tells us about ourselves. Alpaca treats them as separate endpoints with different protocol shapes; the base URL reference lists them apart from each other.

That distinction matters.

If the market data stream drops, our signal quality degrades for as long as it's down. Bad, but recoverable. The next tick arrives, the chart catches up, the scanner warms back up. In a pinch, REST polling can keep parts of the system moving.

If the trade updates stream drops, the failure mode is different. A fill may happen while we are not listening. A cancel may complete. An order may reject. Alpaca still knows the truth, but our local state may not — and the documented websocket flow exposes listen/subscribe behavior, not replay or resume for missed trade_updates.

That was the thing we went looking for.

A small clarification first

There are at least three things people may call "streaming" in this system:

Alpaca's market-data stream — StockDataStream subscribed to bars, trades, and quotes. We have this.
Our own /ws/stream browser feed — pushes state changes from the backend to the dashboard. We have this.
Alpaca's TradingStream subscribed to trade_updates — broker order lifecycle events. We don't have this.

This post is about the third one.

The discovery

We expected to find a TradingStream somewhere — alpaca-py's class for the trade updates websocket, documented here — with reconnect logic to audit.

There isn't one.

Our live stream code instantiates StockDataStream and subscribes to bars, trades, and quotes. That stream powers the dashboard, ticker state, intraday bars, scanner cadence, smart-money tape, and price-trigger checks. It does not subscribe to broker trade updates.

So the original question was slightly wrong. It wasn't:

What happens if the trade stream drops?

It was:

What are we relying on instead of a trade stream?

What we have instead

REST reconciliation. Some good boring machinery, in fact.

Positions sync every 60 seconds. Open-order fill monitoring runs every 10 seconds. The swing runtime has a reconcile_state_with_broker path that pulls current positions, account state, and open orders, then compares them against local pending and runtime state.

That means we are not flying blind. The broker is still the source of truth, and we ask the broker regularly.

But it also means our current system is not event-first for order lifecycle. It is polling-first, with market-data websocket events around it. That is a different architecture than the one we thought we had.

Why we don't have `TradingStream` yet

We built streaming around market data first. That stream had to exist for the scanner, the bars, the ticker state, the tape, and the price triggers — without it, the trading side of the app doesn't function. So it got built, hardened, and threaded through the whole system.

Order lifecycle evolved separately, through REST. Polling positions and open orders, and reconciling against runtime state, was good enough to keep us correct. The polling was already there before we ever asked the question this post is asking, so adding TradingStream always felt like an optimization rather than a fix.

It's also not "subscribe to one more channel." Alpaca's trade updates are a separate endpoint, a separate class, a separate auth/listen flow. Adding it means a real lifecycle: health surface, event handling, idempotency on retries, reconnect behavior, tests. That's a piece of work, not a one-line change.

So the absence is explainable. It's not negligence; it's sequencing.

What `TradingStream` would actually give us

Faster awareness of our own orders, in concrete ways:

Immediate fill awareness. Buys and sells update local state and the UI as Alpaca emits the fill, not on the next 10-second poll.
Partial-fill visibility. A 1,000-share order that fills 400 then 600 currently surfaces as "still open" until the next poll. With trade updates we'd see each fill as it happens.
Cancel and reject visibility. canceled, rejected, order_cancel_rejected, order_replace_rejected — visible immediately rather than inferred when an open order disappears from the next polling pass.
Cleaner dashboard alerts. Fill, cancel, and reject notifications become event-driven rather than diff-driven.
Lower latency state transitions. Our order-fill monitor cadence is 10 seconds. Trade updates would compress that toward round-trip latency.
A natural trigger for reconciliation. On stream connect or reconnect, run REST reconciliation; during normal operation, let stream events drive state transitions and treat polling as the safety net.

What it would not give us is truth on its own. The SDK's internal _run_forever loop catches WebSocketException, sleeps about 0.01s, and re-enters — confirmed in alpaca-py's source. That handles the connection. It does not handle the gap. Events fired during a disconnect are gone unless we fetch them ourselves.

The fix is not "trust reconnect harder." The fix is reconciliation.

The rule

Whenever broker-event delivery is interrupted, or whenever we establish a broker-event stream connection, we reconcile before trusting local state.

Pull open orders. Pull positions. Pull account state where needed. Compare that broker snapshot against our in-memory and persisted runtime state. Close pending rows that are no longer pending. Mark fills we missed. Notice position sizes that moved. Record the reconciliation time. Then continue.

The stream is latency. The broker snapshot is truth.

The next work

Add a thin trade-update supervisor around Alpaca TradingStream.
Subscribe to trade_updates and route fill, cancel, and reject events into the same state-transition path the polling monitor already uses.
On stream start and reconnect, run reconcile_state_with_broker before treating the stream as healthy.
Add bounded backoff around the stream lifecycle instead of relying on the SDK's tight reconnect loop.
Expose broker-event stream health separately from market-data stream health: last trade update received, last reconnect, reconnect count, last reconciliation, reconciliation age.
Keep existing REST polling as the safety net, not as the primary event path.

The dashboard already surfaces market-data stream age. What it doesn't yet have is broker-event stream age, because that stream doesn't exist in our app yet.

The lesson

We went looking for a reconnect problem and found a state-truth problem. Better still: we found that our current system is more honest than it is fast. It polls the broker often enough to recover, but it doesn't yet listen to broker events directly.

This is important to fix, but it is not an emergency. The correctness backstop is real — REST polling plus broker reconciliation means we're not relying on local memory as truth. What we'd be buying with TradingStream is speed, not correctness, and a system that feels and behaves more like a real trading engine instead of a polling loop with good manners.

So the lesson is not that websockets are fragile, though they are. The lesson is that a trading system should never confuse delivery with truth.

The broker is truth. The trade update stream is the fastest notification path. REST reconciliation is how we stay honest when notification fails.