Drafts & Failed Studies — Milkman Trades

drafts & failed.

studies that didn’t make the core playbook

These pages live outside the core playbook for a reason. Every study here was reviewed against the same two filters used for the keepers: quality (sample size, edge magnitude, caveats acknowledged) and actionability (entry/exit clarity, real-time identifiability, fit for 0DTE/short-dated SPX/SPY).

Listing them transparently — rather than burying them — keeps the catalog honest. The point of a research site is to publish what worked and what didn’t.

Draft Studies — In Progress 11 studies

Real signal but a weak frame. These need tweaks — sample-size deepening, reframed metrics, rebuilt code, or merging into adjacent studies — before an edge can be confirmed. Treat as research drafts, not playbook material.

SPX vs SPY Morning Divergence New →

Exploratory first cut. Median |PO_SPY − PO_SPX| at 09:30 is ~40 oscillator units (about two full Saty zones); fast cloud disagrees on 38% of days. Mechanism: SPY’s pre-market bars cushion its EMAs; SPX’s RTH-only EMAs absorb the gap all at once at 09:30. Across 4,527 RTH days (2008–2026), SPY’s reading is the leading signal — |gap| ≥ 1% cohorts produce a 0.54% full-day spread between symmetric sign(d_open) buckets, and the fast-cloud-mismatch cuts deliver a 0.42% spread. Needs VIX/regime stratification, FOMC/OpEx exclusion, and a 2010+ re-cut before it becomes tradable.

ATR Level Cascade New →

First-pass map. 33,153 RTH first-touch events across 25y SPY tagged with continuation / retrace / last and time-of-day bucket. Solid construction (3-min bars, no look-ahead) and clear time-of-day shape — 76–78% continuation pre-10:00 collapsing to last-dominated post-15:00. Needs regime-stratification (high/low ATR days, gap days, FOMC) and a paired entry/exit framework before it earns a tradable slot. Use the explorer as a path-handicapper for now, or the static Saty-style chart for the same-session probability map.

Bilbo Golden Gate Demoted from Core →

Look-ahead bias confirmed. The 1h Phase Oscillator is sampled from the not-yet-closed current hour bar — up to 60 min of forward leakage via merge_asof(direction="backward") against left-labeled HTF bars. Point-in-time-safe rerun shows the flagship 90.3% bear (PO Low+Falling) cohort is closer to ~85%, the bull 75.9% drops to ~71%, and the worst “Mid+Falling/Mid+Rising” buckets partly invert (50.9% → 63.0%, 54.5% → 69.6%). Needs full rebuild with bar-end PO joins before it goes back in the playbook. See audit-reruns/codex_gg_bug_review_2026-04-26.md & opus_gg_bug_review_2026-04-26.md.

Trigger Box — Credit Spread Win Rates →

Mislabel. The 93.6% figure is short-strike no-touch probability, not a realized P&L on credit spreads. Real spreads carry credit, gamma, and settlement risk that the page doesn’t model. Needs reframing to “no-touch probability” with explicit caveats.

Bilbo Box Breakout (raw) →

Near-zero edge. 51% / +0.03R across 50,889 events. Raw signal isn’t tradable on its own. Either retire or repackage as the negative finding that motivates Bilbo Box × HTF PO Bracket Exits.

Bilbo Box × HTF PO Bracket Exits →

Sound frame, shaky entry. The bracket-exit framework is reasonable, but the underlying signal it brackets (raw Bilbo Box Breakout) sits in drafts for near-zero edge. Needs a stronger entry trigger before it’s tradable — or a rebuild that doesn’t depend on the raw box.

Bilbo Continuation →

Not reproducible. 78.6 → 200% extension reach rates can’t be regenerated from the current `backtest_gg_with_po.py`. Rebuild from current code, or fold the surviving 60m PO buckets into Bilbo Golden Gate.

4h PO Reversal Anatomy →

Stale code. Still under re-verification. Event log and velocity buckets came from a script no longer in the repo. Rebuild before promoting to a tradable study.

Daily 21 EMA Reversion →

Underpowered. 50 episodes is too thin to stand alone. Useful as context for swing setups, not a standalone tradable. Needs explicit “close-confirmed / next-session executable” framing.

Premarket ATH Morning Fade →

Too rare. Real edge per event but frequency is only 3.7% (post-rerun n=243). Standalone page doesn’t earn its slot — consolidate with Call-to-Put Reversal under a unified “morning fade” cluster.

Multiday Put Trigger Reversion →

Needs baseline. Reads as a thinner cousin of Call-to-Put Reversal. Needs an explicit weekly-trigger definition and a baseline comparison before it earns a slot beside the keepers.

Failed Studies — No Edge Found 3 studies

These were tested and didn’t hold up. They should be ignored as playbook material. If you want to revisit any of them, expect to redo from scratch with a different framing — the current versions don’t carry tradable edge.

TenAM Traffic PO Divergence →

Honest negative finding. n=36,829 PO divergences with regular signals under-performing baseline. Hidden divergences carry only a marginal, time-window-conditional edge. Don’t trade off this page.

10m vs 60m PO →

Redundant + unreproducible. 10m column can’t be regenerated from current code, and the surviving conclusion (“60m more predictive”) is already covered inside Bilbo Golden Gate. Standalone page adds nothing.

4h PO × OpEx Pin Release →

Over-claiming. Headline cohort n=13. Either deepen the sample dramatically or retire the page. Current version sells more confidence than the data supports.

← back to core playbook