Notebook 022 - Geometry Won the Week
We promised an honest FOMC scorecard. The numbers said lead with levels, not direction — so we rebuilt the desk that way.
Notebook 021 ended with a promise: we would publish the next score after FOMC the same way — honestly, with the number at the top, and with a clearer split between what we forecast and what we could actually trade.
We did. Then we spent the rest of the week building the system the scorecard said we should have had all along.
Wednesday was FOMC. Thursday was the post-FOMC repair rip. Friday we finished cutting the production desk over to strategic always-on positioning — quarterly investment plans, zone maps, tranche ladders, and a posture light that turns green, yellow, or red without anyone typing bin/onyx live at 9:44.
The headline numbers did not get prettier. Direction match landed around 55% on FOMC day and 46% once we expanded the book. Geometry — void lines, chase caps, entry zones, target touches — scored 83% on the session that mattered most. The desk stayed flat through the event and avoided every chase-miss trap the Lab had flagged.
That is not a model victory. It is an architecture lesson. As always: this is a trading system and a process journal, not trading advice.
FOMC Performance Review
Review of trading execution quality across three key dimensions during the latest FOMC cycle.
Direction — premarket up/sideways/down vs tape. Geometry — void, chase, entry, and target vs replay. Desk — authority, homework, and fills vs the written plan.
| Session | Direction | Geometry | Desk |
|---|---|---|---|
| FOMC Eve (Tue) | 36% | 73% | Pass |
| FOMC Day (Wed) | 55% | 27% | A |
| Post-FOMC (Thu) | 46% | 83% | D |
Metrics reflect replay analysis of plan adherence during the FOMC cycle.
Wed geometry was dragged down by limit-fill-no-target replay — void and chase calls were still right. Thu desk D: PLAN LIVE approved in the journal; bin/onyx live never ran.
Direction is research honesty. Geometry is what the trader actually uses. Desk is whether the human layer did what the journal said.
On the three sessions that defined the week, geometry and desk process outscored direction every time — including the day we got direction mostly right.
FOMC day: the confirm ladder beat the model
Premarket read: mixed fragile tape, every watchlist name WATCH-only, negative trade-probability edge across the book. The Lab leaned sideways or down on most names. QQQ was classified up premarket. That was wrong by the close.
The event itself was the familiar trap: hold in line, statement hawkish-neutral, dot plot showing no cuts in 2026, an initial dip that reclaimed VWAP around 14:13, then distribution into the bell. SPY and QQQ finished below session VWAP. Volatility bid. Semis failed to hold the post-release pop.
The desk finished flat. Zero fills. Zero fresh orders. Not because the system was off — because the confirm ladder failed at 15:05. Both index proxies were below VWAP and below the opening range. Volatility was rising. The conditional post-FOMC scout package we had drafted for a handful of leaders was never promoted.
That ladder is not in the Lab score. It should be. It was the best evaluation tool we had on Wednesday — better than asking whether Markov's BEAR lean matched the afternoon bucket.
Lab direction match: 6 of 11 (55%). Executable geometry was weaker on paper (27% credit score) because most names touched their entry zones without filling — limit-fill-no-target dominated the archetype counts. But the actionable calls were right:
- Chase-miss at the open on the names that gapped firm — predicted and occurred.
- Void breaches on the weak mega caps — flagged before any fresh-risk story.
- Flat book on a distribution day — exactly what negative edge and WATCH-only authority required.
The premarket regime read was a C. The process grade was an A.
Thursday Performance Review
Post-FOMC repair session — tape risk-on, book still flat.
Thursday was the repair session the model did not prewrite. Premarket assumed post-FOMC selectivity and risk-off. The tape closed risk-on: broad indices above VWAP, semis strong, volatility soft. The broad gate components that had failed at 15:05 on Wednesday would have passed at 16:00 on Thursday. We still took zero fresh entries — negative trade-probability edge across the book, board watch-only, authority target-management.
Direction — premarket probability bucket vs close. Geometry — chase, void, entry touch vs replay. Desk — what the written plan required.
| Symbol / Group | Direction | Geometry | Desk |
|---|---|---|---|
| AMD / ARM | ✓ | Chase-miss | OR only |
| AAPL | ✓ | No fill | Wait |
| SPCX | ✗ | Void breach | Hold |
| MSFT / AMZN / RKLB | ✓ | Void breach | Avoid |
Quick reference of plan vs. actual execution on the day.
Book totals: Direction 55% · Geometry 83% on the core eleven-name roster. Chase-miss and void layers matched what we would have traded if armed — and why we were not.
What we shipped (the Notebook 021 backlog, done)
Notebook 021 listed eight improvements. Most of them landed this week:
1. Two scores, not one — plus a third
Direction match and executable-outcome score now persist separately in lab_session_scores. We added geometry_match_rate — partial credit across void, chase, entry touch, and target components — and a desk grade that checks homework loaded, flat-when-unarmed discipline, and whether bin/onyx live actually ran when the journal says PLAN LIVE was approved.
Thursday's desk grade was a D for a reason we needed to see in public: Aaron approved conditional live mode Wednesday evening. The receipt never hit the CLI. Zero fills anyway — but the scorecard now catches approval without arming, which no direction percentage ever would.
2. Geometry-first Lab UI
ONYX Lab rebuilt again — strategic-first this time. The zone map leads: buy bands, sell targets, void lines, distance-to-level, filled tranche markers. Probability paths moved down the page. Watch-interest chips and deployment KPI clutter came out. Calibration, scenario, and scorecard panels sit below the operating console, not above it.
When trade-probability edge is negative across the book, the UI should not scream conviction it does not have. Negative-edge days now demote direction visuals and promote ladder geometry — exactly what Tuesday's FOMC-eve session needed.
3. Session score in the UI, every day
Yesterday and Overall panels poll live after close. The banner states the plan date explicitly so operators do not confuse today's rehearsal with yesterday's grade. Post-close is one command:
uv run python scripts/generate_missed_trade_ledger.py --date YYYY-MM-DD --with-lap-score4. Midday calibration
Formal score still waits for close. Midday artifacts now generate at lunch on armed names — direction match, chase state, posture blocks — so 0 for 2 at 1:30 PM does not get flattened into a generous close bucket at 4:00.
5. IPO / low-sample symbols: geometry-only
SPCX carries fewer than twenty daily sessions in our local history. Markov on that sample produces flat, low-conviction probabilities that still render as a directional path. The scorer now suppresses direction match and Brier for geometry_only symbols and grades invalidation, liquidity, and level interaction instead.
On Thursday, SPCX direction was suppressed. Geometry flagged the invalidation breach the recovery hold was already managing. That is the right hierarchy for a name the desk treats as a quarterly position, not a scout.
6. Green / Yellow / Red posture autopilot
Fresh-risk authority needed a simpler surface than PLAN LIVE plus AUTO plus board grade plus broad gate plus chase state. We added desk posture: GREEN allows Lane C builds for symbols in the approved strategic set; YELLOW pauses entries but keeps exits; RED fail-closes all broker mutation.
Posture autopilot runs on the canonical gate stack every desk-watcher cycle. It never arms RED by itself. The dashboard shows the color, the reason, and the unlock path — not gateway jargon.
The bigger move: strategic cutover
Scoring honestly exposed a deeper problem. We were grading a daily scout machine while the book was moving toward quarterly investment plans with tranche ladders, capital amendments, and always-on limit evaluation.
Tactical and strategic were sharing one authority path. That is how you get board WATCH-only rows with auto brackets armed anyway — the scorecard's plan vs execution alignment C from Notebook 021.
So we split them.
Tactical (daily tiered plans, ECG ranking, fresh scouts) stays on the legacy path for rollback. Strategic (AAPL, NVDA, MSFT, SPCX pilot) now flows through approved strategic_plans/*.v1.json, a continuous SIP evaluator, and Lane C intent drain — no daily bin/onyx live for those symbols.
The always-on rule: once a strategic plan is approved, feed + evaluator + executor do not sleep. New entries pause when gates fail — regime, invalidation, liquidity — not because someone forgot the morning ritual. Exits above entry stay armed when a position exists.
data/target_plan.json remains recovery and target metadata for strategic names — watch-only, not fresh scout authority. Session authority carries an empty auto_bracket_symbols list under cutover. Material alerts and buy-target maps are context only; they do not wake the desk for fresh-risk scouts on strategic symbols.
Monday's operator ritual changed: verify feed, evaluator, posture color, per-symbol gates — not PLAN LIVE arming for the mega-cap book.
That is a different desk. It matches what the scorecard kept saying: the trader does not need a better fortune teller. The trader needs levels, gates, and authority that agree.
What still misleads
We are not done. The week surfaced gaps the new grades already track:
Premarket regime persistence. Tuesday and Wednesday both lagged intraday repair. FOMC-eve fragile became post-FOMC rip within twenty-four hours. The scorecard needs a regime flip line — premarket label vs close label — not just direction bucket.
Missed-runner accounting. Broad gate passed at Thursday's close. AMD and ARM ran without us. The geometry score credits chase-miss prediction; it does not yet separate good skip from missed runner we should have amended. That distinction belongs in the desk grade, not the Lab.
Roster hygiene. MU, AVGO, and BE joined the watchlist mid-week. Prices showed stale or missing until feed resync and daily-bar fallback shipped. New symbols should not enter the scored book until Lab readiness passes — we added the gate; discipline is follow it.
Direction as UI headline. Even at 55%, direction match is the wrong thing to stare at when geometry is 83% and every name has negative TP edge. The Lab now leads with zones. Old habits die slower than new panels.
What we actually learned
We asked for an honest FOMC score. We got 55% direction, flat book, confirm ladder correct — and a Thursday where geometry beat direction by nearly two to one on the session that tempted us to chase.
The lesson from Notebook 021 was not "make probabilities smarter." The lesson this week was build the desk around the layer that kept passing:
- Geometry for levels, chase, void, and fill/no-fill.
- Desk process for authority, receipts, and good skips.
- Direction for calibration research — published, not hidden, but no longer leading the UI.
We rebuilt ONYX Lab around the zone map. We split the report card. We cut strategic names over to always-on plans with posture lights instead of daily live arming. We added a desk grade that catches approval without execution — because that gap is real even on a zero-fill day.
The system is not trying to predict the afternoon anymore as its primary job. It is trying to make the written plan executable or honestly unreachable before the open — and to keep capital flat when the geometry says the entry is a chase.
We will keep publishing both numbers. Direction will stay noisy. We expect geometry and desk process to stay useful. That is the scoreboard we should have built first.
The lesson
When your calibration tool fails on direction but passes on geometry, do not only retrain the model. Reorder the cockpit.
Split the grade. Lead with levels. Separate daily scouts from quarterly builds. Grade the desk for doing what the journal says — including the days it correctly did nothing.
FOMC week did not prove our forecasts. It proved our guardrails. We rebuilt the week around that answer.