Configuration¶
Every aspect of the simulated machine is runtime-configurable through the Config class. Parameters are flat (no nested objects) and use builder-style type classes for caches, predictors, and backends.
Basic Usage¶
from rvsim import Config, Cache, Backend, BranchPredictor, MemDepPredictor
config = Config(
width=4,
backend=Backend.OutOfOrder(rob_size=128),
branch_predictor=BranchPredictor.TAGE(),
l1d=Cache("32KB", ways=8, latency=1, mshr_count=8),
l2=Cache("256KB", ways=8, latency=10),
)
Use replace() to derive new configs from a base:
base = Config(width=4, branch_predictor=BranchPredictor.TAGE())
narrow = base.replace(width=2)
wide = base.replace(width=8)
Pipeline¶
| Parameter | Type | Default | Description |
|---|---|---|---|
width |
int |
4 |
Fetch/decode/rename/retire width (instructions per cycle) |
backend |
Backend.* |
OutOfOrder() |
Pipeline backend: Backend.InOrder() or Backend.OutOfOrder(...) |
branch_predictor |
BranchPredictor.* |
TAGE() |
Branch predictor type |
btb_size |
int |
4096 |
Branch target buffer entries |
btb_ways |
int |
4 |
BTB associativity |
ras_size |
int |
32 |
Return address stack depth |
Backend: Out-of-Order¶
Backend.OutOfOrder(
rob_size=128, # Reorder buffer entries
issue_queue_size=32, # Issue queue entries (CAM wakeup/select)
store_buffer_size=32, # Store buffer entries
load_queue_size=32, # Load queue entries (memory ordering)
load_ports=2, # Load ports per cycle
store_ports=1, # Store ports per cycle
prf_gpr_size=256, # Physical GPR file size
prf_fpr_size=128, # Physical FPR file size
fu_config=Fu([...]), # Functional unit pool (see below)
)
Backend: In-Order¶
No parameters — the in-order backend uses a fixed scoreboard-based pipeline. Pipeline width is controlled by the top-level width parameter.
Functional Units (O3 only)¶
Configure the functional unit pool for the out-of-order backend:
from rvsim import Fu
fu = Fu([
Fu.IntAlu(count=4, latency=1), # Integer ALU: add, sub, logic, shift
Fu.IntMul(count=1, latency=3), # Integer multiplier
Fu.IntDiv(count=1, latency=35), # Integer divider (non-pipelined)
Fu.FpAdd(count=2, latency=4), # FP add/sub/compare/convert
Fu.FpMul(count=2, latency=5), # FP multiply
Fu.FpFma(count=2, latency=5), # FP fused multiply-add
Fu.FpDivSqrt(count=1, latency=21), # FP divide/sqrt (non-pipelined)
Fu.Branch(count=2, latency=1), # Branch/jump resolution
Fu.Mem(count=2, latency=1), # Load/store address calculation
])
Omitting a FU type means the backend has zero units of that type. Make sure to include every type your workload exercises.
Branch Predictor¶
BranchPredictor.Static() # Always predict not-taken
BranchPredictor.GShare() # Global history XOR PC
BranchPredictor.Tournament( # Two-level adaptive
global_size_bits=12,
local_hist_bits=10,
local_pred_bits=10,
)
BranchPredictor.Perceptron( # Neural predictor
history_length=32,
table_bits=10,
)
BranchPredictor.TAGE( # Tagged geometric history length
num_banks=4,
table_size=2048,
loop_table_size=256,
reset_interval=2000,
history_lengths=[5, 15, 44, 130],
tag_widths=[9, 9, 10, 10],
)
BranchPredictor.ScLTage( # SC-L-TAGE + ITTAGE (highest accuracy)
# TAGE parameters
num_banks=8,
table_size=2048,
loop_table_size=256,
reset_interval=256_000,
history_lengths=[5, 15, 44, 130, 380, 1024, 2048, 4096],
tag_widths=[9, 9, 10, 10, 11, 11, 12, 12],
# Statistical corrector
sc_num_tables=6,
sc_table_size=512,
sc_counter_bits=3,
# Indirect target TAGE
ittage_num_banks=8,
ittage_table_size=256,
ittage_reset_interval=256_000,
)
Memory Dependence Prediction¶
Controls how loads decide whether they can bypass unresolved older stores.
MemDepPredictor.Blind() # Conservative: loads wait for all older stores (default)
MemDepPredictor.StoreSet( # Store-set predictor (Chrysos & Emer 1998)
ssit_size=2048, # Store Set ID Table entries
lfst_size=256, # Last Fetched Store Table entries
)
| Parameter | Type | Default | Description |
|---|---|---|---|
mem_dep_predictor |
MemDepPredictor.* |
Blind() |
Memory dependence predictor type |
ssit_size |
int |
2048 |
SSIT entries (StoreSet only) — maps PC → store set ID |
lfst_size |
int |
256 |
LFST entries (StoreSet only) — maps store set ID → last dispatched store |
Caches¶
Each cache level is configured independently:
Cache(
size="32KB", # Size: "4KB", "32KB", "1MB", etc.
line="64B", # Line size (default: 64B)
ways=8, # Associativity
latency=1, # Hit latency in cycles
mshr_count=8, # MSHRs for non-blocking operation (0 = blocking)
policy=ReplacementPolicy.LRU(), # Eviction policy
prefetcher=Prefetcher.Stride(), # Hardware prefetcher
)
| Parameter | Type | Default | Description |
|---|---|---|---|
l1i |
Cache |
32KB/4-way/1cy |
L1 instruction cache |
l1d |
Cache |
32KB/4-way/1cy |
L1 data cache |
l2 |
Cache |
256KB/8-way/10cy |
L2 unified cache |
l3 |
Cache or None |
None |
L3 cache (disabled by default) |
inclusion_policy |
Cache.* |
Cache.NINE() |
L1-L2 inclusion policy |
wcb_entries |
int |
0 |
Write-combining buffer entries |
MSHRs matter
With mshr_count=0 (the default), the L1D cache is blocking — every miss stalls the pipeline until the line arrives. Set mshr_count=8 or higher for realistic non-blocking behavior where the O3 backend can execute other instructions while waiting for cache fills.
Replacement Policies¶
ReplacementPolicy.LRU() # Least recently used (default)
ReplacementPolicy.PLRU() # Pseudo-LRU (tree-based)
ReplacementPolicy.FIFO() # First in, first out
ReplacementPolicy.Random() # Random eviction
ReplacementPolicy.MRU() # Most recently used
Prefetchers¶
Prefetcher.Off() # Disabled (default)
Prefetcher.NextLine(degree=1) # Prefetch next line on access
Prefetcher.Stride(degree=1, table_size=64) # PC-indexed stride detection
Prefetcher.Stream(degree=1) # Sequential stream detection
Prefetcher.Tagged(degree=1) # Prefetch-on-prefetch
Inclusion Policies¶
Cache.NINE() # No inclusion, non-exclusive (default)
Cache.Inclusive() # L2 eviction back-invalidates matching L1 lines
Cache.Exclusive() # L1 eviction swaps line into L2
Memory¶
| Parameter | Type | Default | Description |
|---|---|---|---|
ram_size |
str or int |
"256MB" |
Main memory size |
memory_controller |
MemoryController.* |
Simple() |
Memory controller type |
tlb_size |
int |
32 |
iTLB and dTLB entries (fully associative) |
l2_tlb_size |
int |
512 |
Shared L2 TLB entries |
l2_tlb_ways |
int |
4 |
L2 TLB associativity |
l2_tlb_latency |
int |
4 |
L2 TLB hit latency in cycles |
Memory Controller¶
MemoryController.Simple() # Fixed latency (default)
MemoryController.DRAM( # Row-buffer aware timing
t_cas=14, # Column access strobe latency
t_ras=14, # Row access strobe latency
t_pre=14, # Precharge latency
row_miss_latency=120, # Full row-miss penalty
)
System¶
These parameters control the SoC memory map and device configuration. You normally don't need to change them.
| Parameter | Type | Default | Description |
|---|---|---|---|
ram_base |
int |
0x8000_0000 |
RAM base address |
uart_base |
int |
0x1000_0000 |
UART base address |
disk_base |
int |
0x9000_0000 |
VirtIO disk base address |
clint_base |
int |
0x0200_0000 |
CLINT base address |
syscon_base |
int |
0x0010_0000 |
SYSCON base address |
kernel_offset |
int |
0x0020_0000 |
Kernel load offset from ram_base |
bus_width |
int |
8 |
Bus width in bytes |
bus_latency |
int |
4 |
Bus transaction latency in cycles |
clint_divider |
int |
10 |
Timer tick divider (mtime increments every N cycles) |
General¶
| Parameter | Type | Default | Description |
|---|---|---|---|
trace |
bool |
False |
Enable per-instruction commit logging |
initial_sp |
int or None |
None |
Initial stack pointer (auto-configured if None) |
uart_quiet |
bool |
False |
Suppress UART output (useful for sweeps) |
uart_to_stderr |
bool |
False |
Route UART output to stderr instead of stdout |
Example Configurations¶
Minimal embedded core¶
Config(
width=1,
backend=Backend.InOrder(),
branch_predictor=BranchPredictor.Static(),
l1d=Cache("4KB", ways=1, latency=1),
l1i=Cache("4KB", ways=1, latency=1),
l2=None,
)
High-performance O3 core¶
Config(
width=4,
backend=Backend.OutOfOrder(
rob_size=128,
issue_queue_size=48,
load_queue_size=32,
store_buffer_size=32,
prf_gpr_size=256,
prf_fpr_size=128,
fu_config=Fu([
Fu.IntAlu(count=4, latency=1),
Fu.IntMul(count=1, latency=3),
Fu.IntDiv(count=1, latency=35),
Fu.FpAdd(count=2, latency=4),
Fu.FpMul(count=2, latency=5),
Fu.FpFma(count=2, latency=5),
Fu.FpDivSqrt(count=1, latency=21),
Fu.Branch(count=2, latency=1),
Fu.Mem(count=2, latency=1),
]),
),
branch_predictor=BranchPredictor.ScLTage(),
mem_dep_predictor=MemDepPredictor.StoreSet(),
l1d=Cache("32KB", ways=8, latency=1, mshr_count=8,
prefetcher=Prefetcher.Stride(degree=2, table_size=128)),
l1i=Cache("32KB", ways=8, latency=1,
prefetcher=Prefetcher.NextLine(degree=2)),
l2=Cache("256KB", ways=8, latency=10, mshr_count=16),
l3=Cache("4MB", ways=16, latency=30, mshr_count=32),
memory_controller=MemoryController.DRAM(t_cas=14, row_miss_latency=120),
)
Linux-capable system¶
See Linux Boot for a complete config that boots Linux.