Branch Prediction¶
rvsim implements six pluggable branch predictors with shared infrastructure. The predictor is consulted during Fetch1 to steer the instruction stream speculatively.
Shared Infrastructure¶
All predictors share these components:
Branch Target Buffer (BTB)¶
Set-associative cache (default: 4096 entries, 4-way) that maps branch PCs to their target addresses. Used for indirect jumps where the target isn't encoded in the instruction.
Return Address Stack (RAS)¶
Circular buffer (default: 32 entries) for call/return prediction.
Per RISC-V spec Table 2.1, both x1 (ra) and x5 (t0) are recognized as link registers:
| Instruction | rd is link? | rs1 is link? | Action |
|---|---|---|---|
jal rd, offset |
Yes | — | Push return address onto RAS |
jal rd, offset |
No | — | No RAS action (plain jump) |
jalr rd, rs1, offset |
No | Yes | Pop from RAS (return) |
jalr rd, rs1, offset |
Yes | Yes, rd ≠ rs1 | Pop then push (coroutine swap) |
jalr rd, rs1, offset |
Yes | Yes, rd = rs1 | Push (call through link register) |
jalr rd, rs1, offset |
Yes | No | Push (indirect call) |
The RAS supports speculative recovery: on a branch misprediction, the RAS pointer is restored from the per-instruction snapshot.
Global History Register (GHR)¶
Arbitrary-length bit vector recording the direction (taken/not-taken) of recent branches. The GHR is speculatively updated during Fetch1 and repaired on misprediction from per-instruction snapshots.
The GHR length is unlimited — it grows to match the longest history needed by the selected predictor (e.g., TAGE's geometric history lengths can exceed 700 bits).
Predictors¶
Static¶
Always predicts not-taken. Useful as a baseline for measuring how much a predictor contributes.
GShare¶
XOR of the branch PC and the global history register indexes into a table of 2-bit saturating counters. Simple and effective for workloads with strong global correlation.
Tournament¶
Two-level adaptive predictor with three components:
- Global predictor — 2-bit counters indexed by global history
- Local predictor — per-PC local history table feeding a second table of 2-bit counters
- Meta-predictor (chooser) — selects between global and local predictions based on which has been more accurate recently
Configurable parameters: global_size_bits, local_hist_bits, local_pred_bits.
Perceptron¶
Neural branch predictor. Each entry in the table is a vector of integer weights, one per GHR bit. The dot product of the weight vector and the recent branch history determines the prediction. Weights are trained on mispredictions using a threshold-based update rule.
Configurable parameters: history_length, table_bits.
TAGE (Tagged Geometric History Length)¶
Uses multiple tagged tables with geometrically increasing history lengths:
- Base predictor — simple bimodal table (always consulted)
- Tagged tables — each table uses a different history length (default: 5, 11, 22, 44, 89, 178, 356, 712 for 8 banks). Entries are tagged with a hash of the PC and history to avoid aliasing.
- Longest match wins — the prediction comes from the table with the longest matching history
- Loop predictor — detects counted loops and predicts the loop exit iteration
- USE_ALT_ON_NA — meta-counter that learns whether newly allocated (weak) provider entries should be trusted or whether the alternate (second-longest match) prediction is better. When the provider entry's counter is weak (0 or -1) and the meta-counter is non-negative, the alternate prediction is used instead.
- Useful counter reset — periodically resets the "useful" counters to allow new entries to replace stale ones
Configurable parameters: num_banks, table_size, loop_table_size, reset_interval, history_lengths, tag_widths.
SC-L-TAGE (Statistical Corrector + Loop + TAGE)¶
The most accurate predictor available. Combines four sub-predictors into a single high-accuracy predictor, following Seznec's Championship Branch Prediction (CBP) winning designs:
- TAGE — same tagged geometric history as the standalone TAGE predictor (default: 8 banks)
- Loop Predictor — detects counted loops and overrides TAGE when a loop iteration count is learned
- Statistical Corrector (SC) — a bank of small signed counters indexed by different history lengths that learns to correct systematic TAGE errors. The SC sum is initialized with a centered confidence value from the TAGE prediction:
(2 * |ctr| + 1) * direction. When the total SC sum disagrees with TAGE and exceeds a threshold, the SC prediction overrides TAGE. - ITTAGE (Indirect Target TAGE) — predicts indirect branch targets (computed jumps, virtual dispatch) using the same geometric history structure as TAGE but storing target addresses instead of direction counters
USE_ALT_ON_NA is also applied within SC-L-TAGE's TAGE component, ensuring the SC receives the effective TAGE prediction (after alt-pred override) rather than the raw provider prediction.
Configurable parameters: all TAGE parameters plus sc_num_tables, sc_table_size, sc_history_lengths, sc_counter_bits, ittage_num_banks, ittage_table_size, ittage_history_lengths, ittage_tag_widths, ittage_reset_interval.
Predictor Comparison¶
Here's a representative comparison on the included benchmarks (width=1, default caches):
| Predictor | Accuracy (aggregate) | IPC (aggregate) | Speedup vs Static |
|---|---|---|---|
| Static | 34.4% | 0.49 | 1.00× |
| GShare | 60.6% | 0.55 | 1.08× |
| Perceptron | 67.9% | 0.58 | 1.11× |
| Tournament | 70.9% | 0.59 | 1.20× |
| TAGE | 73.2% | 0.58 | 1.21× |
| SC-L-TAGE | 84.1% | 0.66 | 1.29× |
SC-L-TAGE provides the highest accuracy by combining TAGE with statistical correction and loop prediction. On qsort, SC-L-TAGE achieves 82.5% accuracy and 0.67 IPC versus standalone TAGE's 71.2% and 0.58 IPC — a 15.8% IPC improvement. Run scripts/analysis/branch_predict.py to regenerate numbers for your workloads.