Skip to content

Filling Stateful Benchmark Fixtures

The fill-stateful command produces BlockchainEngineStatefulFixture JSON for benchmark tests by driving block construction on a live EL client via testing_buildBlockV1 against a pre-loaded network snapshot. The fixtures are replayed by benchmarkoor against the same snapshot on any EL client. This replaces the gas-benchmarks MITMProxy approach.

When to use fill-stateful

Use the standard fill for t8n-based fixture generation. Use fill-stateful to run benchmarks in a stateful environment (e.g., perfnet, Kurtosis, or other snapshots) to observe how state size affects performance. Any test can be run using this command, but some benchmarks – like the ones under tests/benchmark/stateful/ – only produce meaningful results in such environments.

fill-stateful does not manage datadirs — it expects the target client to already be running with the snapshot mounted. Snapshot management (overlayfs / ZFS / copy) is benchmarkoor's responsibility on the replay side.

Prerequisites

The target client must expose:

  • testing (testing_buildBlockV1) — block construction with explicit transaction ordering.
  • engineengine_newPayloadVX, engine_forkchoiceUpdatedVX.
  • eth, debug — chain queries and debug_setHead for between-test rewind.
  • web3 (optional) — web3_clientVersion is recorded into the fixture's _info.filling-transition-tool for traceability.

The production-ready filler is ethpandaops/geth:master.

End-to-end flow

1. Bootstrap a snapshot

For local development, a kurtosis enclave produces a synthetic snapshot:

# /tmp/fillst-kurtosis-args.yaml
participants:
  - el_type: geth
    el_image: ethpandaops/geth:master
    el_extra_params:
      - "--http.api=admin,debug,eth,miner,net,txpool,web3,testing,engine"
      - "--miner.gaslimit=1000000000000"
    cl_type: lodestar
network_params:
  preset: minimal
  genesis_delay: 30
  fulu_fork_epoch: 0
  gas_limit: 1000000000000
ethereum_genesis_generator_params:
  image: "ethpandaops/ethereum-genesis-generator:5.3.5"
kurtosis run --enclave fillst /path/to/ethereum-package \
    --args-file /tmp/fillst-kurtosis-args.yaml

Drain a few blocks, stop CL then EL cleanly, extract the datadir + genesis bundle:

docker stop -t 30 vc-1-geth-lodestar--... cl-1-lodestar-geth--...
docker stop -t 60 el-1-geth-lodestar--...
docker cp el-1-geth-lodestar--...:/data/geth/execution-data /tmp/multi-snap/geth/
rm -f /tmp/multi-snap/geth/execution-data/geth/{LOCK,nodes/LOCK,chaindata/LOCK}
kurtosis files download fillst el_cl_genesis_data /tmp/multi-snap/genesis
kurtosis files download fillst jwt_file /tmp/fillst-out/jwt

For production benchmarking, use a perfnet / bloatnet snapshot instead.

2. Start a standalone client on a copy of the snapshot

Keep the original snapshot pristine so benchmarkoor can reuse it on the replay side:

cp -a /tmp/multi-snap/geth/execution-data /tmp/multi-snap/geth-fillcopy

docker run -d --name geth-fillcopy \
  -p 18545:8545 -p 18551:8551 \
  -v /tmp/multi-snap/geth-fillcopy:/datadir \
  -v /tmp/multi-snap/genesis:/genesis:ro \
  -v /tmp/fillst-out/jwt:/jwt:ro \
  ethpandaops/geth:master \
  --datadir=/datadir --override.genesis=/genesis/genesis.json \
  --http --http.addr=0.0.0.0 --http.port=8545 \
  --http.api=admin,debug,eth,miner,net,txpool,web3,testing,engine \
  --authrpc.port=8551 --authrpc.addr=0.0.0.0 \
  --authrpc.jwtsecret=/jwt/jwtsecret \
  --syncmode=full --gcmode=archive \
  --miner.gaslimit=1000000000000 \
  --nodiscover --maxpeers=0

3. Fill

uv run fill-stateful \
    --clean \
    --rpc-endpoint=http://127.0.0.1:18545 \
    --engine-endpoint=http://127.0.0.1:18551 \
    --engine-jwt-secret-file=/tmp/fillst-out/jwt/jwtsecret \
    --fork=Osaka \
    --output=/tmp/fillst-out/fixtures \
    --snapshot-block=0x<32-byte-hash> \
    --gas-benchmark-values=10,30 \
    tests/benchmark/stateful/bloatnet/test_transient_storage.py

4. Replay

Point benchmarkoor's datadirs.geth.source_dir at the pristine snapshot (/tmp/multi-snap/geth/execution-data) — never at the fillcopy — and tests.source.eest_fixtures.local_fixtures_dir at the fill output. See the benchmarkoor docs for the full config shape.

CLI options

Required:

  • --engine-jwt-secret-file PATH — JWT secret for engine API auth.
  • --fork NAME — fork to fill against, e.g. Osaka.

Optional:

  • --rpc-endpoint URL — default http://localhost:8545.
  • --engine-endpoint URL — derived from --rpc-endpoint with port 8551.
  • --chain-id INT — auto-detected from the client.
  • --snapshot-block HASH_OR_NUMBER — anchor to a specific block; accepts a 32-byte hash (recommended) or an integer block number (hex 0x... or decimal). Defaults to the client's latest, recorded by hash.
  • --rpc-seed-key 0x<64hex> — pin the seed account for reproducible fills. When omitted, a random key is generated and funded via CL withdrawal each session.
  • --address-stubs PATH — JSON map of label → on-chain address (and optional pkey). Required by stub-dependent tests; see Stub-dependent tests below.
  • --max-gas-per-test INT — overrides the fork's transaction_gas_limit_cap() (EIP-7825).
  • --gas-benchmark-values 10,30,... — gas budgets in millions to parametrize against.
  • --default-{gas-price,max-fee-per-gas,max-priority-fee-per-gas,max-fee-per-blob-gas} — pin per-session fees; defaults bump live-query values by 1.5×.
  • --output PATH — default ./fixtures.
  • --clean — wipe the output dir before filling.

Output layout

<output>/
└── blockchain_tests_stateful_engine/
    ├── pre_run/
    │   └── <start_block_hash>.json # session bootstrap (factory deploy + seed funding)
    └── for_<fork>_at_<gas>M/
        └── <test_path>/
            └── <test>.json         # per-test setup + execution payloads

Each pre_run/<start_block_hash>.json (a StatefulPreRunFixture) is replayed once per benchmarkoor run. Per-test fixtures (BlockchainEngineStatefulFixture) reference their setup file by hash: a fixture with startBlockHash = 0xabc... is preceded by pre_run/0xabc....json. Each per-test fixture carries snapshotBlockNumber/Hash, startBlockNumber/Hash, setupEngineNewPayloads, engineNewPayloads, plus a benchmarkGasUsed field and the EL build in _info.filling-transition-tool. The hash-based filename leaves room for multiple pre-run files (e.g. different setup variants off one snapshot) without coordinating names.

Snapshot anchoring

--snapshot-block accepts a hash on purpose. Anchoring to latest works against a quiescent client, but a live reorg between session start and fixture write would silently re-anchor the fixture to a different block. The hash form rejects that.

State pollution across fills

Re-running fill-stateful against the same datadir progresses the chain past previous fills. Always start from a fresh copy of the snapshot.

Single-worker

fill-stateful forces -n 0 — pytest-xdist is not used; the chain advances sequentially.

Stub-dependent tests

Some stateful tests (e.g. test_single_opcode.py, test_multi_opcode.py) target on-chain accounts the snapshot already contains. They reach them two ways:

  • @pytest.mark.stub_parametrize("name", "prefix_") — parametrize values pulled from --address-stubs matching prefix_.
  • pre.deploy_contract(stub="<label>", ...) — direct runtime lookup.

Without a matching --address-stubs entry, both paths fail loudly: the marker path with FAILED ... MISSING_STUBS carrying the missing prefix; the runtime path with ValueError("Stub '<label>' not found..."). Stock pytest's silent skip on empty parametrize is overridden — running a bloatnet test with no stubs is a misconfiguration, not a valid outcome.

Stubs must point at addresses already on the live client; fill-stateful validates each at session start. The kurtosis devnet recipe above does not include them — use a bloatnet / perfnet snapshot (or a custom snapshot generator) for these tests.

Architecture

fill-stateful reuses fill's standard spec loop and swaps the backend. Two backends now exist behind a common protocol; the rest of fill is unchanged.

                 BlockchainTest.generate_block_data
                              │
                ┌─────────────┴──────────────┐
                ▼                            ▼
       TransitionTool                  ClientBackend
       (t8n CLI / server)         (testing_buildBlockV1 on a live EL)
            ▼                            ▼
       make_fixture /               make_stateful_fixture
       make_hive_fixture                  ▼
            ▼                  BlockchainEngineStatefulFixture
       BlockchainFixture /                + StatefulPreRunFixture
       BlockchainEngineFixture

Both backends satisfy FillerBackend (client_clis/filler_backend.py). ClientBackend.evaluate(...) returns a TransitionToolOutput with an EnginePayloadMetadata attached (GetPayloadResponse + engine API versions); the spec receives it as TestingBuildBlock(BuiltBlock) and forwards the payload verbatim — no header rebuild, no side-channel capture.

Plugins and shared code

Module Role
cli/pytest_commands/fill_stateful.py CLI entry — fill-stateful Click command.
plugins/fill_stateful/fill_stateful.py Session pre-run + t8n/session_t8n overrides; CLI options.
plugins/shared/live_client_flags.py Live-client flags + fee fixtures factored out of execute/execute.py.
plugins/execute/pre_alloc.py Reused Alloc; pending_transactions() drains the queue without sending.
plugins/execute/rpc/chain_builder_eth_rpc.py fund_via_withdrawals + build_block_with_transactions — return EnginePayloadMetadata.
client_clis/client_backend.py ClientBackend: builds via testing_buildBlockV1, advances via engine_newPayload + forkchoiceUpdated (SYNCING-retry).
specs/blockchain.py make_stateful_fixture, payload_metadata_to_fixture, Block.phase, _split_blocks_by_phase, TestingBuildBlock.

Flags introduced

Flag Where Purpose
--snapshot-block fill-stateful Anchor by 32-byte hash (reorg-safe) or block number; defaults to latest.
--rpc-seed-key fill-stateful Pin the seed EOA; otherwise generated + funded via CL withdrawal.
--default-gas-price, --default-max-fee-per-gas, --default-max-priority-fee-per-gas, --default-max-fee-per-blob-gas shared/live_client_flags Pin per-session fees; defaults bump a one-shot live query by 1.5x.
--max-gas-per-test, --max-tx-per-batch, --transaction-gas-limit, ... shared/live_client_flags Generic live-client knobs reused across commands.

Fill flow

  1. Session pre-run (_session_pre_run, autouse session fixture):
    1. Resolve --snapshot-block → record (number, hash) on the backend.
    2. Fund the seed EOA via CL withdrawal (ChainBuilderEthRPC.fund_via_withdrawals).
    3. Deploy the deterministic factory if missing (contracts.deploy_deterministic_factory_contract).
    4. Capture latest as the start_block anchor on the backend.
    5. Write pre_run/<start_block_hash>.json (a StatefulPreRunFixture).
  2. Per-test fill (make_stateful_fixture):
    1. Materialise pre.fund_eoa / pre.deploy_contract queue into a synthetic setup block prepended to self.blocks.
    2. _split_blocks_by_phase splits any mixed-phase blocks (e.g. EIP-7702 SETUP + benchmark TEST).
    3. For each block, ClientBackend.evaluate builds + finalises it; payload partitioned by Block.phase into setupEngineNewPayloads vs engineNewPayloads.
    4. Write <test>.json (a BlockchainEngineStatefulFixture).
  3. Per-test reset (_reset_chain_between_tests): debug_setHead(start_block.number), re-fetch latest, abort if hash drifted.

Fixture types

Type One per Carries
StatefulPreRunFixture session snapshot/start anchors, engineNewPayloads (the withdrawal + factory deploy blocks).
BlockchainEngineStatefulFixture test snapshot/start anchors, setupEngineNewPayloads, engineNewPayloads, benchmarkGasUsed, _info.filling-transition-tool (EL build string).
FixtureEngineNewPayload block params (engine_newPayloadVX args), newPayloadVersion, forkchoiceUpdatedVersion, phase (SETUP/EXECUTION/None).

What's captured in each payload

The client's GetPayloadResponse is recorded verbatim. Rebuilding from FixtureHeader would diverge on client-picked fields (e.g. gas_limit) and break engine_newPayload hash validation on replay.

Replay (benchmarkoor)

pristine snapshot ───copy──▶ datadir ───▶ geth ───▶ benchmarkoor
                                                       │
                                                       ├── replay pre_run/<startBlockHash>.json
                                                       │   for each fixture's startBlockHash
                                                       │   (1 client, 1 time per hash per run)
                                                       │
                                                       └── for each test fixture:
                                                            ├── replay setupEngineNewPayloads
                                                            ├── replay engineNewPayloads (timed)
                                                            ├── debug_setHead → start_block
                                                            └── re-fetch latest, verify hash

Sanity check: each fixture's benchmarkGasUsed (recorded at fill) should equal benchmarkoor's measured gas_used_total for that test.