The Bilbo Box is the range of the first five compression bars. Theory says when price breaks out, momentum carries. Across 50,889 breaks on SPY over 25 years and five timeframes, the median follow-through is about one-third of the box width in the break direction over the next ten bars. The edge is real but small. The best entry is the one most traders skip — immediate, at the break. Waiting costs you the edge.
The Saty Phase Oscillator flags compression when Bollinger Band width falls below 2×ATR — the coiled-spring zone. The Bilbo Box is the price range of the first five contiguous compression bars. After that, the box is locked. If compression ends before five bars, the box locks at whatever range has formed so far. Break-watch then begins on the next bar.
compression=1 after a non-compression run. High and low of that bar seed the box.Path rules. Break-watch only begins after the box locks (lock_bar+1) — the lock bar itself is never used to detect breaks, avoiding same-bar look-ahead. Outside bars that pierce both boundaries count as stopped-out first. If a new compression starts before the old box breaks, the old box is discarded.
For every timeframe, immediate (enter at the box boundary the moment price touches it) produces a positive median net return in R-units. The win rates hover just above 50% — this is a small, consistent edge, not a coin-flipper's dream.
| Timeframe | N | MFE (pts) | MAE (pts) | Net (pts) | Net R | Net > 0 | Stopped |
|---|---|---|---|---|---|---|---|
| 3-minute | 32,041 | 0.22 | 0.19 | +0.01 | 0.03 | 50.7% | 33.2% |
| 10-minute | 7,656 | 0.41 | 0.35 | +0.02 | 0.03 | 51.0% | 28.4% |
| 1-hour* | 3,399 | 0.88 | 0.94 | +0.06 | 0.04 | 52.0% | 33.6% |
| 4-hour | 936 | 1.77 | 1.65 | +0.10 | 0.03 | 51.1% | 31.3% |
| Daily* | 279 | 3.11 | 2.73 | +0.50 | 0.06 | 53.4% | 38.4% |
*Median values for each column. 1-hour carries a residual KNOWLEDGE.md caveat that hourly Phase Oscillator runs ~8–13% low on average vs TradingView (median |Δ| ≈ 6 PO points after the Apr 2026 wick-clip rebuild; pre-clip the gap was 20–45%). The compression signal is therefore directionally trustworthy but not tick-for-tick on this timeframe. Daily with n=279 is underpowered; treat as suggestive, not conclusive.
The theory says wait for a retest. The data says don't. Immediate entry beats retest on every timeframe, and beats close-outside on the faster intraday frames. On 4h and daily, close-outside becomes competitive (and slightly ahead on raw Net-R), but at the cost of fewer eligible setups — the trigger rate drops from ~99% to 75–85%.
| Entry variant | Trigger rate | 3m Net-R | 10m Net-R | 1h Net-R | 4h Net-R | 1d Net-R |
|---|---|---|---|---|---|---|
| Immediate | 97–99% | +0.03 | +0.03 | +0.04 | +0.03 | +0.06 |
| Close outside the box | 81–85% | −0.04 | −0.02 | 0.00 | +0.05 | +0.09 |
| Wait for retest | 76–88% | −0.07 | −0.08 | −0.02 | −0.01 | 0.00 |
Median Net-R over the 10-bar forward window. Lower trigger rates reflect the subset of breaks where the variant's conditions actually fire.
Why does waiting hurt? Retests and close-outside confirmations are conditional on cleaner-looking breaks. Intuitively that should help. But in the data, the breaks that cleanly close outside or obligingly retest have already shown you their best move by the time you're allowed in — the mean-reversion crowd is waiting at exactly the same levels. Entering the moment price crosses the box captures the full impulse.
The spec says five-bar boxes are canonical. The numbers disagree. Short-lived compressions (1–4 bars) carry the edge. Fully-formed five-bar boxes are flat or slightly negative on the intraday frames. On daily this flips — 5-bar boxes there have the best median Net-R (+0.16) — but that's on n=138 and may just be the lower-frequency market behaving differently.
| Box bars | 3m Net-R | 10m Net-R | 1h Net-R | 4h Net-R | 1d Net-R |
|---|---|---|---|---|---|
| 1 bar (degenerate) | +0.11 | +0.18 | +0.15 | +0.12 | 0.00 |
| 2 bars | +0.10 | +0.07 | +0.03 | +0.08 | +0.12 |
| 3 bars | +0.07 | +0.10 | +0.07 | −0.32 | n<20 |
| 4 bars | +0.02 | +0.11 | +0.14 | +0.22 | +0.05 |
| 5 bars (full box) | 0.00 | −0.01 | +0.01 | +0.01 | +0.16 |
Median Net-R, immediate entry, 10-bar window. 1-bar boxes are the degenerate case where compression was only active on the start bar; these carry the highest stop-out rate (45–60%) but also the biggest median follow-through when the break holds.
Bull breaks slightly outperform bear breaks on every timeframe except 3m (where they're tied). The gap widens on higher timeframes: bull breaks on daily carry a +0.22R median, while bear breaks on daily run −0.18R. Most of the daily "edge" is a directional edge, not a compression edge.
| Timeframe | Bull N | Bull Net-R | Bear N | Bear Net-R |
|---|---|---|---|---|
| 3-minute | 16,629 | +0.05 | 15,412 | 0.00 |
| 10-minute | 3,977 | +0.06 | 3,679 | −0.01 |
| 1-hour | 1,834 | +0.09 | 1,565 | −0.02 |
| 4-hour | 514 | +0.16 | 422 | −0.15 |
| Daily | 161 | +0.22 | 118 | −0.18 |
The 25-year sample carries an upward drift. SPY returned roughly 7× over this window. Part of the bull-vs-bear asymmetry is that carry — compression breaks upward are riding the trend, compression breaks downward are fighting it. On intraday bars the drift is washed out by path; on daily bars it's the dominant force.
Normalising box width by daily ATR, narrow boxes have the best median Net-R but the worst stop-out rate — the classic tight-stop tradeoff. Wide boxes rarely stop you out but give you a muddier edge because the box itself already consumed the move.
Take the break, not the retest. Across every timeframe tested, entering the moment price crosses the box boundary produced a better median Net-R than waiting for a close-outside confirmation or a retest. Retest is worse than flipping a coin on the intraday frames (−0.07R on 3m, −0.08R on 10m).
Smaller boxes, bigger follow-through. Don't wait for the textbook five-bar formation — on the intraday frames, compressions that resolve within 1–4 bars carry more edge than fully-formed boxes. The compression signal's information value decays as the box ages.
Higher timeframes favour the bull side. On 4h and daily bars, bear breaks have a negative expectation while bull breaks remain positive. Selling a downside compression break on daily SPY has been a losing proposition over this sample.
Size stops at the opposite boundary. The opposite end of the box is the natural invalidation line. R-multiples in this study use that distance. With an MFE of roughly 0.7R and a median stop-out rate of ~30% on the 10-bar window, realistic take-profit targets sit between 0.5R and 1.0R. Aiming beyond that is a regime trade, not a statistical edge.
What this study does not claim. The net-positive edge is small. At ~51% hit rates and ~0.03R median on the intraday timeframes, this is a texture, not an exploit. It becomes actionable only when stacked with independent filters (trend, time of day, volatility regime) or when the box itself is unusually narrow relative to ATR. The raw compression-break signal is a starting point, not a setup.
Data: 25 years of SPY 1-minute data (Jan 2000 – Apr 2026), aggregated into 3m / 10m / 1h / 4h / 1d indicator tables. Pivot Ribbon EMAs validated to 0.000% error vs TradingView on 10m; ATR Levels to 0.00–0.07%; 10-minute Phase Oscillator to 0.5–3.5% after warmup. Hourly PO was re-validated post-clip against a fresh TradingView 60m export (4,035 overlapping bars, Apr 2025 – Apr 2026): median |Δ| of 6 PO points and a median relative gap of −8.5% on bars where |TV PO| ≥ 20 — materially better than the pre-clip 20–45% range, though tail bars can still diverge by ±50–70%.
Look-ahead safeguards: break-watch begins on lock_bar + 1 (never the lock bar itself). Outcome measurement reads bars entry_idx + 1 through entry_idx + N inclusive; same-bar excursions from the entry bar are ignored. Outside bars (both box boundaries pierced intrabar) are classified by close position and flagged; direction-ambiguous outside bars (close inside the box) are excluded from all variant stats.
Fill model: immediate-entry fill price equals the box boundary (box_high for bull, box_low for bear) unless the bar gapped through at the open, in which case the bar's open price is used. Close-outside uses the confirming bar's close. Retest uses the relevant box boundary.
Sample guardrails: buckets with n<20 are suppressed. Buckets with 20≤n<50 are flagged in the full run log but not in these tables. Stale-box rollover: if a new compression starts before the current box breaks, the current box is discarded.
Full design doc: /root/spy/proposal_bilbo_box.md. Implementation: /root/spy/backtest_bilbo_box.py. Event-level CSV (50,889 events): /root/spy/analyst/bilbo_box_events.csv.