Draft — 2026-04-23. Fresh publication. Methodology was codex-reviewed for look-ahead bias (break-watch starts at lock_bar+1), same-bar stop resolution, stale-box rollover, and gap-through entries. Sample: 50,889 breaks across 25 years of SPY data. Do not trade from these numbers without your own validation.

Home

Study — Bilbo Box Breakout

When price breaks out of a compression box,
is the follow-through real edge?

The Bilbo Box is the range of the first five compression bars. Theory says when price breaks out, momentum carries. Across 50,889 breaks on SPY over 25 years and five timeframes, the median follow-through is about one-third of the box width in the break direction over the next ten bars. The edge is real but small. The best entry is the one most traders skip — immediate, at the break. Waiting costs you the edge.

01 — The Setup

The Saty Phase Oscillator flags compression when Bollinger Band width falls below 2×ATR — the coiled-spring zone. The Bilbo Box is the price range of the first five contiguous compression bars. After that, the box is locked. If compression ends before five bars, the box locks at whatever range has formed so far. Break-watch then begins on the next bar.

Step 01

Compression opens

First bar with compression=1 after a non-compression run. High and low of that bar seed the box.

Step 02

Box forms (up to 5 bars)

While compression stays on, box expands with each bar. Cap at 5 bars, or lock early if compression ends.

Step 03

Break detected

First bar after lock where price trades outside the box range. Gap-through opens use the opening price as entry.

Step 04

Stop at opposite end

Bull break stops at box_low, bear break stops at box_high. That stop distance is the R unit used for cross-timeframe comparison.

Path rules. Break-watch only begins after the box locks (lock_bar+1) — the lock bar itself is never used to detect breaks, avoiding same-bar look-ahead. Outside bars that pierce both boundaries count as stopped-out first. If a new compression starts before the old box breaks, the old box is discarded.

02 — Headline Numbers (Immediate Entry, 10-bar Window)

For every timeframe, immediate (enter at the box boundary the moment price touches it) produces a positive median net return in R-units. The win rates hover just above 50% — this is a small, consistent edge, not a coin-flipper's dream.

+0.03R

Median net return
(3m / 10m / 4h)

+0.04R

Median net return
(1h — see caveat)

+0.06R

Median net return
(1d — tentative)

~51%

Net > 0 rate across timeframes

Timeframe	N	MFE (pts)	MAE (pts)	Net (pts)	Net R	Net > 0	Stopped
3-minute	32,041	0.22	0.19	+0.01	0.03	50.7%	33.2%
10-minute	7,656	0.41	0.35	+0.02	0.03	51.0%	28.4%
1-hour*	3,399	0.88	0.94	+0.06	0.04	52.0%	33.6%
4-hour	936	1.77	1.65	+0.10	0.03	51.1%	31.3%
Daily*	279	3.11	2.73	+0.50	0.06	53.4%	38.4%

*Median values for each column. 1-hour carries a residual KNOWLEDGE.md caveat that hourly Phase Oscillator runs ~8–13% low on average vs TradingView (median |Δ| ≈ 6 PO points after the Apr 2026 wick-clip rebuild; pre-clip the gap was 20–45%). The compression signal is therefore directionally trustworthy but not tick-for-tick on this timeframe. Daily with n=279 is underpowered; treat as suggestive, not conclusive.

03 — The Entry Variant Question

The theory says wait for a retest. The data says don't. Immediate entry beats retest on every timeframe, and beats close-outside on the faster intraday frames. On 4h and daily, close-outside becomes competitive (and slightly ahead on raw Net-R), but at the cost of fewer eligible setups — the trigger rate drops from ~99% to 75–85%.

Entry variant	Trigger rate	3m Net-R	10m Net-R	1h Net-R	4h Net-R	1d Net-R
Immediate	97–99%	+0.03	+0.03	+0.04	+0.03	+0.06
Close outside the box	81–85%	−0.04	−0.02	0.00	+0.05	+0.09
Wait for retest	76–88%	−0.07	−0.08	−0.02	−0.01	0.00

Median Net-R over the 10-bar forward window. Lower trigger rates reflect the subset of breaks where the variant's conditions actually fire.

Why does waiting hurt? Retests and close-outside confirmations are conditional on cleaner-looking breaks. Intuitively that should help. But in the data, the breaks that cleanly close outside or obligingly retest have already shown you their best move by the time you're allowed in — the mean-reversion crowd is waiting at exactly the same levels. Entering the moment price crosses the box captures the full impulse.

04 — Which Box Durations Actually Work

The spec says five-bar boxes are canonical. The numbers disagree. Short-lived compressions (1–4 bars) carry the edge. Fully-formed five-bar boxes are flat or slightly negative on the intraday frames. On daily this flips — 5-bar boxes there have the best median Net-R (+0.16) — but that's on n=138 and may just be the lower-frequency market behaving differently.

Box bars	3m Net-R	10m Net-R	1h Net-R	4h Net-R	1d Net-R
1 bar (degenerate)	+0.11	+0.18	+0.15	+0.12	0.00
2 bars	+0.10	+0.07	+0.03	+0.08	+0.12
3 bars	+0.07	+0.10	+0.07	−0.32	n<20
4 bars	+0.02	+0.11	+0.14	+0.22	+0.05
5 bars (full box)	0.00	−0.01	+0.01	+0.01	+0.16

Median Net-R, immediate entry, 10-bar window. 1-bar boxes are the degenerate case where compression was only active on the start bar; these carry the highest stop-out rate (45–60%) but also the biggest median follow-through when the break holds.

05 — Direction Bias

Bull breaks slightly outperform bear breaks on every timeframe except 3m (where they're tied). The gap widens on higher timeframes: bull breaks on daily carry a +0.22R median, while bear breaks on daily run −0.18R. Most of the daily "edge" is a directional edge, not a compression edge.

Timeframe	Bull N	Bull Net-R	Bear N	Bear Net-R
3-minute	16,629	+0.05	15,412	0.00
10-minute	3,977	+0.06	3,679	−0.01
1-hour	1,834	+0.09	1,565	−0.02
4-hour	514	+0.16	422	−0.15
Daily	161	+0.22	118	−0.18

The 25-year sample carries an upward drift. SPY returned roughly 7× over this window. Part of the bull-vs-bear asymmetry is that carry — compression breaks upward are riding the trend, compression breaks downward are fighting it. On intraday bars the drift is washed out by path; on daily bars it's the dominant force.

06 — Narrow vs Wide Boxes

Normalising box width by daily ATR, narrow boxes have the best median Net-R but the worst stop-out rate — the classic tight-stop tradeoff. Wide boxes rarely stop you out but give you a muddier edge because the box itself already consumed the move.

Narrow third

+0.07R

3m median Net-R. But 47% stop-out rate — many attempts, small individual wins.

Medium third

+0.02R

3m median. Stop-out 32%. The middle-of-the-road regime.

Wide third

+0.02R

3m median. Stop-out only 21% — wide boxes rarely get knocked out, but the follow-through is a smaller fraction of the stop distance.

07 — Practical Takeaways

Take the break, not the retest. Across every timeframe tested, entering the moment price crosses the box boundary produced a better median Net-R than waiting for a close-outside confirmation or a retest. Retest is worse than flipping a coin on the intraday frames (−0.07R on 3m, −0.08R on 10m).

Smaller boxes, bigger follow-through. Don't wait for the textbook five-bar formation — on the intraday frames, compressions that resolve within 1–4 bars carry more edge than fully-formed boxes. The compression signal's information value decays as the box ages.

Higher timeframes favour the bull side. On 4h and daily bars, bear breaks have a negative expectation while bull breaks remain positive. Selling a downside compression break on daily SPY has been a losing proposition over this sample.

Size stops at the opposite boundary. The opposite end of the box is the natural invalidation line. R-multiples in this study use that distance. With an MFE of roughly 0.7R and a median stop-out rate of ~30% on the 10-bar window, realistic take-profit targets sit between 0.5R and 1.0R. Aiming beyond that is a regime trade, not a statistical edge.

What this study does not claim. The net-positive edge is small. At ~51% hit rates and ~0.03R median on the intraday timeframes, this is a texture, not an exploit. It becomes actionable only when stacked with independent filters (trend, time of day, volatility regime) or when the box itself is unusually narrow relative to ATR. The raw compression-break signal is a starting point, not a setup.

08 — Methodology Notes

Data: 25 years of SPY 1-minute data (Jan 2000 – Apr 2026), aggregated into 3m / 10m / 1h / 4h / 1d indicator tables. Pivot Ribbon EMAs validated to 0.000% error vs TradingView on 10m; ATR Levels to 0.00–0.07%; 10-minute Phase Oscillator to 0.5–3.5% after warmup. Hourly PO was re-validated post-clip against a fresh TradingView 60m export (4,035 overlapping bars, Apr 2025 – Apr 2026): median |Δ| of 6 PO points and a median relative gap of −8.5% on bars where |TV PO| ≥ 20 — materially better than the pre-clip 20–45% range, though tail bars can still diverge by ±50–70%.

Look-ahead safeguards: break-watch begins on lock_bar + 1 (never the lock bar itself). Outcome measurement reads bars entry_idx + 1 through entry_idx + N inclusive; same-bar excursions from the entry bar are ignored. Outside bars (both box boundaries pierced intrabar) are classified by close position and flagged; direction-ambiguous outside bars (close inside the box) are excluded from all variant stats.

Fill model: immediate-entry fill price equals the box boundary (box_high for bull, box_low for bear) unless the bar gapped through at the open, in which case the bar's open price is used. Close-outside uses the confirming bar's close. Retest uses the relevant box boundary.

Sample guardrails: buckets with n<20 are suppressed. Buckets with 20≤n<50 are flagged in the full run log but not in these tables. Stale-box rollover: if a new compression starts before the current box breaks, the current box is discarded.

Full design doc: /root/spy/proposal_bilbo_box.md. Implementation: /root/spy/backtest_bilbo_box.py. Event-level CSV (50,889 events): /root/spy/analyst/bilbo_box_events.csv.

Bilbo Golden Gate Bilbo Continuation