ethereum.merkle_patricia_trie

Modified Merkle Patricia Trie (MPT), the data structure that commits to large amounts of state with a single hash.

The MPT is a 16-ary radix trie augmented with cryptographic hashing, so each unique mapping of keys to values has a deterministic root hash. Block headers commit to the state, transactions, and receipts through these roots, allowing nodes to verify any individual entry against the header without storing the entire trie.

The trie has three kinds of internal node:

  • A LeafNode terminates a path and stores a value.

  • An ExtensionNode compresses a sequence of nibbles shared by every key passing through it, avoiding chains of single-child branches.

  • A BranchNode has up to sixteen children, one per nibble value, plus an optional value for a key that terminates exactly at this node.

Keys are processed as nibbles (half-bytes, each 0 to 15 inclusive) by bytes_to_nibble_list, and stored within nodes in a compressed hex-prefix encoding produced by nibble_list_to_compact.

Some tries are secured, meaning their keys are hashed with keccak256 before insertion. Hashing distributes keys uniformly so adversarial choices cannot create a deeply unbalanced trie. The state and storage tries are secured; the transaction and receipt tries are not.

The mapping of keys to values is exposed through Trie; the root function reduces a trie to its 32-byte commitment.

EMPTY_TRIE_ROOT

77
EMPTY_TRIE_ROOT = Hash32(
78
    hex_to_bytes(
79
        "56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421"
80
    )
81
)

LeafNode

Terminal node in the trie, holding a value at the end of a key path.

94
@final
95
@slotted_freezable
96
@dataclass
class LeafNode:

rest_of_key

Nibbles of the key not yet consumed by ancestor nodes.

102
    rest_of_key: Bytes

value

Value stored at this key, in its encoded form.

107
    value: Extended

ExtensionNode

Internal node that compresses a run of nibbles shared by every key passing through it.

Without this optimization, deep tries of similar keys would otherwise require long chains of single-child BranchNodes.

113
@final
114
@slotted_freezable
115
@dataclass
class ExtensionNode:

key_segment

Sequence of nibbles shared by all keys descending through this node.

127
    key_segment: Bytes

subnode

Encoded child node reached after consuming key_segment.

132
    subnode: Extended

BranchSubnodes

140
BranchSubnodes = Tuple[
141
    Extended,
142
    Extended,
143
    Extended,
144
    Extended,
145
    Extended,
146
    Extended,
147
    Extended,
148
    Extended,
149
    Extended,
150
    Extended,
151
    Extended,
152
    Extended,
153
    Extended,
154
    Extended,
155
    Extended,
156
    Extended,
157
]

BranchNode

Internal node with up to sixteen children, one per nibble value, plus an optional value for a key that terminates exactly at this node.

168
@final
169
@slotted_freezable
170
@dataclass
class BranchNode:

subnodes

Encoded children, indexed by the next nibble of the key.

177
    subnodes: BranchSubnodes

value

Value for a key that terminates at this node, or b"" if no such key exists.

182
    value: Extended

InternalNode

189
InternalNode = LeafNode | ExtensionNode | BranchNode

K

195
K = TypeVar("K", bound=Bytes)

V

196
V = TypeVar("V", bound=Extended | None)

encode_account

Encode an Account for inclusion in the state trie.

The Account dataclass holds nonce, balance, and code hash, but not the account's storage. The corresponding storage_root (the root of the account's own storage trie) is therefore passed in separately.

def encode_account(raw_account_data: Account, ​​storage_root: Bytes) -> Bytes:
200
    <snip>
209
    return rlp.encode(
210
        (
211
            raw_account_data.nonce,
212
            raw_account_data.balance,
213
            storage_root,
214
            raw_account_data.code_hash,
215
        )
216
    )

encode_internal_node

Encode an InternalNode into its RLP form.

When the resulting serialization is at least 32 bytes, keccak256 is applied so that parents store a fixed-size hash. Shorter encodings are returned unhashed and embedded directly into the parent, since storing a hash would only waste space.

None represents the absence of a node and is encoded as b"".

def encode_internal_node(node: Optional[InternalNode]) -> Extended:
220
    <snip>
233
    unencoded: Extended
234
    if node is None:
235
        unencoded = b""
236
    elif isinstance(node, LeafNode):
237
        unencoded = (
238
            nibble_list_to_compact(node.rest_of_key, True),
239
            node.value,
240
        )
241
    elif isinstance(node, ExtensionNode):
242
        unencoded = (
243
            nibble_list_to_compact(node.key_segment, False),
244
            node.subnode,
245
        )
246
    elif isinstance(node, BranchNode):
247
        unencoded = list(node.subnodes) + [node.value]
248
    else:
249
        raise AssertionError(f"Invalid internal node type {type(node)}!")
250
251
    encoded = rlp.encode(unencoded)
252
    if len(encoded) < 32:
253
        return unencoded
254
    else:
255
        return keccak256(encoded)

encode_node

Encode a value for storage in the trie.

Account values require a storage_root and are encoded with encode_account; raw Bytes are returned unchanged; everything else is RLP-encoded.

def encode_node(node: Extended, ​​storage_root: Bytes | None) -> Bytes:
259
    <snip>
269
    from ethereum.state import Account
270
271
    if isinstance(node, Account):
272
        assert storage_root is not None
273
        return encode_account(node, storage_root)
274
    elif isinstance(node, Bytes):
275
        return node
276
    else:
277
        return rlp.encode(node)

Trie

Key-value mapping with a single root hash that uniquely identifies its contents.

A trie may be secured, meaning keys are hashed with keccak256 before insertion to prevent adversarial keys from producing a deeply unbalanced trie. The state and storage tries are secured; the transaction and receipt tries are not.

A trie has a default value representing key absence: storing default at a key is equivalent to omitting the key, since the underlying MPT represents missing keys by leaving them out entirely.

280
@final
281
@dataclass
class Trie:

secured

When True, keys are hashed with keccak256 before being inserted into the underlying MPT.

300
    secured: bool

default

Value treated as key absence — storing this value through trie_set removes the key, and trie_get returns it for missing keys.

308
    default: V

_data

320
    _data: Dict[K, V] = field(default_factory=dict)

copy_trie

Create a copy of trie.

Since only frozen objects may be stored in a trie, the contents are shared between the original and the copy.

def copy_trie(trie: Trie[K, V]) -> Trie[K, V]:
324
    <snip>
330
    return Trie(trie.secured, trie.default, copy.copy(trie._data))

trie_set

Insert or update key in trie with the given value.

Storing the trie's default value deletes the key, since a trie represents default by omitting the key entirely.

def trie_set(trie: Trie[K, V], ​​key: K, ​​value: V) -> None:
334
    <snip>
342
    if value == trie.default:
343
        if key in trie._data:
344
            del trie._data[key]
345
    else:
346
        trie._data[key] = value

trie_get

Look up key in trie, returning the trie's default value if absent.

def trie_get(trie: Trie[K, V], ​​key: K) -> V:
350
    <snip>
355
    return trie._data.get(key, trie.default)

common_prefix_length

Find the length of the longest common prefix of two sequences.

def common_prefix_length(a: Sequence, ​​b: Sequence) -> int:
359
    <snip>
362
    for i in range(len(a)):
363
        if i >= len(b) or a[i] != b[i]:
364
            return i
365
    return len(a)

nibble_list_to_compact

Compress a list of nibbles into bytes using hex-prefix encoding.

Nibbles are packed two-per-byte, preceded by a single flag nibble at the high position of the first byte:

+---+---+---------+--------+
| _ | _ | is_leaf | parity |
+---+---+---------+--------+
  3   2      1        0

The lowest bit of the flag nibble encodes the parity of the nibble-list length (0 for even, 1 for odd). The next bit distinguishes leaf nodes from extension nodes. The two highest bits are unused.

When the length is odd the first nibble of x is packed into the same byte as the flag, leaving an even number of remaining nibbles to pair off.

def nibble_list_to_compact(x: Bytes, ​​is_leaf: bool) -> Bytes:
369
    <snip>
389
    compact = bytearray()
390
391
    if len(x) % 2 == 0:  # ie even length
392
        compact.append(16 * (2 * is_leaf))
393
        for i in range(0, len(x), 2):
394
            compact.append(16 * x[i] + x[i + 1])
395
    else:
396
        compact.append(16 * ((2 * is_leaf) + 1) + x[0])
397
        for i in range(1, len(x), 2):
398
            compact.append(16 * x[i] + x[i + 1])
399
400
    return Bytes(compact)

bytes_to_nibble_list

Split each input byte into its high and low nibble, producing a sequence of bytes each holding a value from 0 to 15 inclusive.

def bytes_to_nibble_list(bytes_: Bytes) -> Bytes:
404
    <snip>
408
    nibble_list = bytearray(2 * len(bytes_))
409
    for byte_index, byte in enumerate(bytes_):
410
        nibble_list[byte_index * 2] = (byte & 0xF0) >> 4
411
        nibble_list[byte_index * 2 + 1] = byte & 0x0F
412
    return Bytes(nibble_list)

_prepare_trie

Convert a Trie into the nibble-keyed mapping consumed by patricialize.

Each value is encoded with encode_node; if the value is an Account, get_storage_root must be supplied to provide its storage root. Keys are hashed with keccak256 when the trie is secured, then expanded into nibble form via bytes_to_nibble_list.

def _prepare_trie(trie: Trie[K, V], ​​get_storage_root: Optional[Callable[[Address], Root]]) -> Mapping[Bytes, Bytes]:
419
    <snip>
435
    from ethereum.state import Account, Address
436
437
    mapped: MutableMapping[Bytes, Bytes] = {}
438
439
    for preimage, value in trie._data.items():
440
        if isinstance(value, Account):
441
            assert get_storage_root is not None
442
            address = Address(preimage)
443
            encoded_value = encode_node(value, get_storage_root(address))
444
        elif value is None:
445
            raise AssertionError("cannot encode `None`")
446
        else:
447
            encoded_value = encode_node(value)
448
        if encoded_value == b"":
449
            raise AssertionError
450
        key: Bytes
451
        if trie.secured:
452
            # "secure" tries hash keys once before construction
453
            key = keccak256(preimage)
454
        else:
455
            key = preimage
456
        mapped[bytes_to_nibble_list(key)] = encoded_value
457
458
    return mapped

root

Compute the root hash of trie.

The trie is patricialized into a tree of InternalNodes; the root node is RLP-encoded and hashed with keccak256 to produce the fixed-size Hash32 used in block headers.

get_storage_root is required when encoding Account values, so it must be supplied when computing the state root.

def root(trie: Trie[K, V], ​​get_storage_root: Optional[Callable[[Address], Root]]) -> Root:
465
    <snip>
480
    from ethereum.state import Root
481
482
    obj = _prepare_trie(trie, get_storage_root)
483
484
    root_node = encode_internal_node(patricialize(obj, Uint(0)))
485
    if len(rlp.encode(root_node)) < 32:
486
        return keccak256(rlp.encode(root_node))
487
    else:
488
        assert isinstance(root_node, Bytes)
489
        return Root(root_node)

patricialize

Recursively build the trie for obj, starting level nibbles into the keys.

The structure of the returned subtree is determined by inspecting the keys present at the current level:

  1. With no keys, the subtree is empty and None is returned.

  2. With a single key, a LeafNode holds the remaining nibbles and the value.

  3. When every key shares a non-empty prefix at this level, an ExtensionNode consumes the prefix and the algorithm recurses for the rest.

  4. Otherwise the keys are partitioned by their next nibble into up to sixteen groups, each recursively patricialized, and combined into a BranchNode.

def patricialize(obj: Mapping[Bytes, Bytes], ​​level: Uint) -> Optional[InternalNode]:
495
    <snip>
516
    if len(obj) == 0:
517
        return None
518
519
    arbitrary_key = next(iter(obj))
520
521
    # if leaf node
522
    if len(obj) == 1:
523
        leaf = LeafNode(arbitrary_key[level:], obj[arbitrary_key])
524
        return leaf
525
526
    # prepare for extension node check by finding max j such that all keys in
527
    # obj have the same key[i:j]
528
    substring = arbitrary_key[level:]
529
    prefix_length = len(substring)
530
    for key in obj:
531
        prefix_length = min(
532
            prefix_length, common_prefix_length(substring, key[level:])
533
        )
534
535
        # finished searching, found another key at the current level
536
        if prefix_length == 0:
537
            break
538
539
    # if extension node
540
    if prefix_length > 0:
541
        prefix = arbitrary_key[int(level) : int(level) + prefix_length]
542
        return ExtensionNode(
543
            prefix,
544
            encode_internal_node(
545
                patricialize(obj, level + Uint(prefix_length))
546
            ),
547
        )
548
549
    branches: List[MutableMapping[Bytes, Bytes]] = []
550
    for _ in range(16):
551
        branches.append({})
552
    value = b""
553
    for key in obj:
554
        if len(key) == level:
555
            value = obj[key]
556
        else:
557
            branches[key[level]][key] = obj[key]
558
559
    subnodes = tuple(
560
        encode_internal_node(patricialize(branches[k], level + Uint(1)))
561
        for k in range(16)
562
    )
563
    return BranchNode(
564
        cast(BranchSubnodes, assert_type(subnodes, Tuple[Extended, ...])),
565
        value,
566
    )