Skip to content

Configuration

Every aspect of the simulated machine is runtime-configurable through the Config class. Parameters are flat (no nested objects) and use builder-style type classes for caches, predictors, and backends.

Basic Usage

from rvsim import Config, Cache, Backend, BranchPredictor, MemDepPredictor

config = Config(
    width=4,
    backend=Backend.OutOfOrder(rob_size=128),
    branch_predictor=BranchPredictor.TAGE(),
    l1d=Cache("32KB", ways=8, latency=1, mshr_count=8),
    l2=Cache("256KB", ways=8, latency=10),
)

Use replace() to derive new configs from a base:

base = Config(width=4, branch_predictor=BranchPredictor.TAGE())
narrow = base.replace(width=2)
wide = base.replace(width=8)

Pipeline

Parameter Type Default Description
width int 4 Fetch/decode/rename/retire width (instructions per cycle)
backend Backend.* OutOfOrder() Pipeline backend: Backend.InOrder() or Backend.OutOfOrder(...)
branch_predictor BranchPredictor.* TAGE() Branch predictor type
btb_size int 4096 Branch target buffer entries
btb_ways int 4 BTB associativity
ras_size int 32 Return address stack depth

Backend: Out-of-Order

Backend.OutOfOrder(
    rob_size=128,            # Reorder buffer entries
    issue_queue_size=32,     # Issue queue entries (CAM wakeup/select)
    store_buffer_size=32,    # Store buffer entries
    load_queue_size=32,      # Load queue entries (memory ordering)
    load_ports=2,            # Load ports per cycle
    store_ports=1,           # Store ports per cycle
    prf_gpr_size=256,        # Physical GPR file size
    prf_fpr_size=128,        # Physical FPR file size
    fu_config=Fu([...]),     # Functional unit pool (see below)
)

Backend: In-Order

Backend.InOrder()

No parameters — the in-order backend uses a fixed scoreboard-based pipeline. Pipeline width is controlled by the top-level width parameter.

Functional Units (O3 only)

Configure the functional unit pool for the out-of-order backend:

from rvsim import Fu

fu = Fu([
    Fu.IntAlu(count=4, latency=1),       # Integer ALU: add, sub, logic, shift
    Fu.IntMul(count=1, latency=3),       # Integer multiplier
    Fu.IntDiv(count=1, latency=35),      # Integer divider (non-pipelined)
    Fu.FpAdd(count=2, latency=4),        # FP add/sub/compare/convert
    Fu.FpMul(count=2, latency=5),        # FP multiply
    Fu.FpFma(count=2, latency=5),        # FP fused multiply-add
    Fu.FpDivSqrt(count=1, latency=21),   # FP divide/sqrt (non-pipelined)
    Fu.Branch(count=2, latency=1),       # Branch/jump resolution
    Fu.Mem(count=2, latency=1),          # Load/store address calculation
])

Omitting a FU type means the backend has zero units of that type. Make sure to include every type your workload exercises.


Branch Predictor

BranchPredictor.Static()          # Always predict not-taken
BranchPredictor.GShare()          # Global history XOR PC
BranchPredictor.Tournament(       # Two-level adaptive
    global_size_bits=12,
    local_hist_bits=10,
    local_pred_bits=10,
)
BranchPredictor.Perceptron(       # Neural predictor
    history_length=32,
    table_bits=10,
)
BranchPredictor.TAGE(             # Tagged geometric history length
    num_banks=4,
    table_size=2048,
    loop_table_size=256,
    reset_interval=2000,
    history_lengths=[5, 15, 44, 130],
    tag_widths=[9, 9, 10, 10],
)
BranchPredictor.ScLTage(          # SC-L-TAGE + ITTAGE (highest accuracy)
    # TAGE parameters
    num_banks=8,
    table_size=2048,
    loop_table_size=256,
    reset_interval=256_000,
    history_lengths=[5, 15, 44, 130, 380, 1024, 2048, 4096],
    tag_widths=[9, 9, 10, 10, 11, 11, 12, 12],
    # Statistical corrector
    sc_num_tables=6,
    sc_table_size=512,
    sc_counter_bits=3,
    # Indirect target TAGE
    ittage_num_banks=8,
    ittage_table_size=256,
    ittage_reset_interval=256_000,
)

Memory Dependence Prediction

Controls how loads decide whether they can bypass unresolved older stores.

MemDepPredictor.Blind()           # Conservative: loads wait for all older stores (default)
MemDepPredictor.StoreSet(         # Store-set predictor (Chrysos & Emer 1998)
    ssit_size=2048,               # Store Set ID Table entries
    lfst_size=256,                # Last Fetched Store Table entries
)
Parameter Type Default Description
mem_dep_predictor MemDepPredictor.* Blind() Memory dependence predictor type
ssit_size int 2048 SSIT entries (StoreSet only) — maps PC → store set ID
lfst_size int 256 LFST entries (StoreSet only) — maps store set ID → last dispatched store

Caches

Each cache level is configured independently:

Cache(
    size="32KB",          # Size: "4KB", "32KB", "1MB", etc.
    line="64B",           # Line size (default: 64B)
    ways=8,               # Associativity
    latency=1,            # Hit latency in cycles
    mshr_count=8,         # MSHRs for non-blocking operation (0 = blocking)
    policy=ReplacementPolicy.LRU(),       # Eviction policy
    prefetcher=Prefetcher.Stride(),       # Hardware prefetcher
)
Parameter Type Default Description
l1i Cache 32KB/4-way/1cy L1 instruction cache
l1d Cache 32KB/4-way/1cy L1 data cache
l2 Cache 256KB/8-way/10cy L2 unified cache
l3 Cache or None None L3 cache (disabled by default)
inclusion_policy Cache.* Cache.NINE() L1-L2 inclusion policy
wcb_entries int 0 Write-combining buffer entries

MSHRs matter

With mshr_count=0 (the default), the L1D cache is blocking — every miss stalls the pipeline until the line arrives. Set mshr_count=8 or higher for realistic non-blocking behavior where the O3 backend can execute other instructions while waiting for cache fills.

Replacement Policies

ReplacementPolicy.LRU()      # Least recently used (default)
ReplacementPolicy.PLRU()     # Pseudo-LRU (tree-based)
ReplacementPolicy.FIFO()     # First in, first out
ReplacementPolicy.Random()   # Random eviction
ReplacementPolicy.MRU()      # Most recently used

Prefetchers

Prefetcher.Off()                              # Disabled (default)
Prefetcher.NextLine(degree=1)                 # Prefetch next line on access
Prefetcher.Stride(degree=1, table_size=64)    # PC-indexed stride detection
Prefetcher.Stream(degree=1)                   # Sequential stream detection
Prefetcher.Tagged(degree=1)                   # Prefetch-on-prefetch

Inclusion Policies

Cache.NINE()        # No inclusion, non-exclusive (default)
Cache.Inclusive()    # L2 eviction back-invalidates matching L1 lines
Cache.Exclusive()   # L1 eviction swaps line into L2

Memory

Parameter Type Default Description
ram_size str or int "256MB" Main memory size
memory_controller MemoryController.* Simple() Memory controller type
tlb_size int 32 iTLB and dTLB entries (fully associative)
l2_tlb_size int 512 Shared L2 TLB entries
l2_tlb_ways int 4 L2 TLB associativity
l2_tlb_latency int 4 L2 TLB hit latency in cycles

Memory Controller

MemoryController.Simple()     # Fixed latency (default)
MemoryController.DRAM(        # Row-buffer aware timing
    t_cas=14,                 # Column access strobe latency
    t_ras=14,                 # Row access strobe latency
    t_pre=14,                 # Precharge latency
    row_miss_latency=120,     # Full row-miss penalty
)

System

These parameters control the SoC memory map and device configuration. You normally don't need to change them.

Parameter Type Default Description
ram_base int 0x8000_0000 RAM base address
uart_base int 0x1000_0000 UART base address
disk_base int 0x9000_0000 VirtIO disk base address
clint_base int 0x0200_0000 CLINT base address
syscon_base int 0x0010_0000 SYSCON base address
kernel_offset int 0x0020_0000 Kernel load offset from ram_base
bus_width int 8 Bus width in bytes
bus_latency int 4 Bus transaction latency in cycles
clint_divider int 10 Timer tick divider (mtime increments every N cycles)

General

Parameter Type Default Description
trace bool False Enable per-instruction commit logging
initial_sp int or None None Initial stack pointer (auto-configured if None)
uart_quiet bool False Suppress UART output (useful for sweeps)
uart_to_stderr bool False Route UART output to stderr instead of stdout

Example Configurations

Minimal embedded core

Config(
    width=1,
    backend=Backend.InOrder(),
    branch_predictor=BranchPredictor.Static(),
    l1d=Cache("4KB", ways=1, latency=1),
    l1i=Cache("4KB", ways=1, latency=1),
    l2=None,
)

High-performance O3 core

Config(
    width=4,
    backend=Backend.OutOfOrder(
        rob_size=128,
        issue_queue_size=48,
        load_queue_size=32,
        store_buffer_size=32,
        prf_gpr_size=256,
        prf_fpr_size=128,
        fu_config=Fu([
            Fu.IntAlu(count=4, latency=1),
            Fu.IntMul(count=1, latency=3),
            Fu.IntDiv(count=1, latency=35),
            Fu.FpAdd(count=2, latency=4),
            Fu.FpMul(count=2, latency=5),
            Fu.FpFma(count=2, latency=5),
            Fu.FpDivSqrt(count=1, latency=21),
            Fu.Branch(count=2, latency=1),
            Fu.Mem(count=2, latency=1),
        ]),
    ),
    branch_predictor=BranchPredictor.ScLTage(),
    mem_dep_predictor=MemDepPredictor.StoreSet(),
    l1d=Cache("32KB", ways=8, latency=1, mshr_count=8,
              prefetcher=Prefetcher.Stride(degree=2, table_size=128)),
    l1i=Cache("32KB", ways=8, latency=1,
              prefetcher=Prefetcher.NextLine(degree=2)),
    l2=Cache("256KB", ways=8, latency=10, mshr_count=16),
    l3=Cache("4MB", ways=16, latency=30, mshr_count=32),
    memory_controller=MemoryController.DRAM(t_cas=14, row_miss_latency=120),
)

Linux-capable system

See Linux Boot for a complete config that boots Linux.