ethereum.merkle_patricia_trie
Modified Merkle Patricia Trie (MPT), the data structure that commits to large amounts of state with a single hash.
The MPT is a 16-ary radix trie augmented with cryptographic hashing, so each unique mapping of keys to values has a deterministic root hash. Block headers commit to the state, transactions, and receipts through these roots, allowing nodes to verify any individual entry against the header without storing the entire trie.
The trie has three kinds of internal node:
A
LeafNodeterminates a path and stores a value.An
ExtensionNodecompresses a sequence of nibbles shared by every key passing through it, avoiding chains of single-child branches.A
BranchNodehas up to sixteen children, one per nibble value, plus an optional value for a key that terminates exactly at this node.
Keys are processed as nibbles (half-bytes, each 0 to 15 inclusive)
by bytes_to_nibble_list, and stored within nodes in a compressed
hex-prefix encoding produced by nibble_list_to_compact.
Some tries are secured, meaning their keys are hashed with keccak256
before insertion. Hashing distributes keys uniformly so adversarial choices
cannot create a deeply unbalanced trie. The state and storage tries are
secured; the transaction and receipt tries are not.
The mapping of keys to values is exposed through Trie; the root
function reduces a trie to its 32-byte commitment.
EMPTY_TRIE_ROOT¶
| 77 | EMPTY_TRIE_ROOT = Hash32( |
|---|---|
| 78 | hex_to_bytes( |
| 79 | "56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421" |
| 80 | ) |
| 81 | ) |
LeafNode ¶
Terminal node in the trie, holding a value at the end of a key path.
| 94 | @final |
|---|
| 95 | @slotted_freezable |
|---|
| 96 | @dataclass |
|---|
class LeafNode:
rest_of_key¶
Nibbles of the key not yet consumed by ancestor nodes.
| 102 | rest_of_key: Bytes |
|---|
value¶
Value stored at this key, in its encoded form.
| 107 | value: Extended |
|---|
ExtensionNode ¶
Internal node that compresses a run of nibbles shared by every key passing through it.
Without this optimization, deep tries of similar keys would otherwise
require long chains of single-child BranchNodes.
| 113 | @final |
|---|
| 114 | @slotted_freezable |
|---|
| 115 | @dataclass |
|---|
class ExtensionNode:
key_segment¶
Sequence of nibbles shared by all keys descending through this node.
| 127 | key_segment: Bytes |
|---|
subnode¶
Encoded child node reached after consuming key_segment.
| 132 | subnode: Extended |
|---|
BranchSubnodes¶
| 140 | BranchSubnodes = Tuple[ |
|---|---|
| 141 | Extended, |
| 142 | Extended, |
| 143 | Extended, |
| 144 | Extended, |
| 145 | Extended, |
| 146 | Extended, |
| 147 | Extended, |
| 148 | Extended, |
| 149 | Extended, |
| 150 | Extended, |
| 151 | Extended, |
| 152 | Extended, |
| 153 | Extended, |
| 154 | Extended, |
| 155 | Extended, |
| 156 | Extended, |
| 157 | ] |
BranchNode ¶
Internal node with up to sixteen children, one per nibble value, plus an optional value for a key that terminates exactly at this node.
| 168 | @final |
|---|
| 169 | @slotted_freezable |
|---|
| 170 | @dataclass |
|---|
class BranchNode:
subnodes¶
Encoded children, indexed by the next nibble of the key.
| 177 | subnodes: BranchSubnodes |
|---|
value¶
Value for a key that terminates at this node, or b"" if no such key
exists.
| 182 | value: Extended |
|---|
InternalNode¶
| 189 | InternalNode = LeafNode | ExtensionNode | BranchNode |
|---|
K¶
| 195 | K = TypeVar("K", bound=Bytes) |
|---|
V¶
| 196 | V = TypeVar("V", bound=Extended | None) |
|---|
encode_account ¶
encode_internal_node ¶
Encode an InternalNode into its RLP form.
When the resulting serialization is at least 32 bytes, keccak256 is
applied so that parents store a fixed-size hash. Shorter encodings are
returned unhashed and embedded directly into the parent, since storing a
hash would only waste space.
None represents the absence of a node and is encoded as b"".
def encode_internal_node(node: Optional[InternalNode]) -> Extended:
| 220 | <snip> |
|---|---|
| 233 | unencoded: Extended |
| 234 | if node is None: |
| 235 | unencoded = b"" |
| 236 | elif isinstance(node, LeafNode): |
| 237 | unencoded = ( |
| 238 | nibble_list_to_compact(node.rest_of_key, True), |
| 239 | node.value, |
| 240 | ) |
| 241 | elif isinstance(node, ExtensionNode): |
| 242 | unencoded = ( |
| 243 | nibble_list_to_compact(node.key_segment, False), |
| 244 | node.subnode, |
| 245 | ) |
| 246 | elif isinstance(node, BranchNode): |
| 247 | unencoded = list(node.subnodes) + [node.value] |
| 248 | else: |
| 249 | raise AssertionError(f"Invalid internal node type {type(node)}!") |
| 250 | |
| 251 | encoded = rlp.encode(unencoded) |
| 252 | if len(encoded) < 32: |
| 253 | return unencoded |
| 254 | else: |
| 255 | return keccak256(encoded) |
encode_node ¶
Encode a value for storage in the trie.
Account values require a storage_root and are encoded with
encode_account; raw Bytes are returned unchanged; everything else
is RLP-encoded.
def encode_node(node: Extended, storage_root: Bytes | None) -> Bytes:
| 259 | <snip> |
|---|---|
| 269 | from ethereum.state import Account |
| 270 | |
| 271 | if isinstance(node, Account): |
| 272 | assert storage_root is not None |
| 273 | return encode_account(node, storage_root) |
| 274 | elif isinstance(node, Bytes): |
| 275 | return node |
| 276 | else: |
| 277 | return rlp.encode(node) |
Trie ¶
Key-value mapping with a single root hash that uniquely identifies its contents.
A trie may be secured, meaning keys are hashed with keccak256
before insertion to prevent adversarial keys from producing a deeply
unbalanced trie. The state and storage tries are secured; the transaction
and receipt tries are not.
A trie has a default value representing key absence: storing
default at a key is equivalent to omitting the key, since the
underlying MPT represents missing keys by leaving them out entirely.
| 280 | @final |
|---|
| 281 | @dataclass |
|---|
class Trie:
secured¶
When True, keys are hashed with keccak256 before being inserted
into the underlying MPT.
| 300 | secured: bool |
|---|
default¶
| 308 | default: V |
|---|
_data¶
| 320 | _data: Dict[K, V] = field(default_factory=dict) |
|---|
copy_trie ¶
Create a copy of trie.
Since only frozen objects may be stored in a trie, the contents are shared between the original and the copy.
trie_set ¶
Insert or update key in trie with the given value.
Storing the trie's default value deletes the key, since a trie
represents default by omitting the key entirely.
trie_get ¶
Look up key in trie, returning the trie's default value if absent.
common_prefix_length ¶
Find the length of the longest common prefix of two sequences.
nibble_list_to_compact ¶
Compress a list of nibbles into bytes using hex-prefix encoding.
Nibbles are packed two-per-byte, preceded by a single flag nibble at the high position of the first byte:
+---+---+---------+--------+
| _ | _ | is_leaf | parity |
+---+---+---------+--------+
3 2 1 0
The lowest bit of the flag nibble encodes the parity of the nibble-list
length (0 for even, 1 for odd). The next bit distinguishes leaf nodes
from extension nodes. The two highest bits are unused.
When the length is odd the first nibble of x is packed into the same
byte as the flag, leaving an even number of remaining nibbles to pair off.
def nibble_list_to_compact(x: Bytes, is_leaf: bool) -> Bytes:
| 369 | <snip> |
|---|---|
| 389 | compact = bytearray() |
| 390 | |
| 391 | if len(x) % 2 == 0: # ie even length |
| 392 | compact.append(16 * (2 * is_leaf)) |
| 393 | for i in range(0, len(x), 2): |
| 394 | compact.append(16 * x[i] + x[i + 1]) |
| 395 | else: |
| 396 | compact.append(16 * ((2 * is_leaf) + 1) + x[0]) |
| 397 | for i in range(1, len(x), 2): |
| 398 | compact.append(16 * x[i] + x[i + 1]) |
| 399 | |
| 400 | return Bytes(compact) |
bytes_to_nibble_list ¶
Split each input byte into its high and low nibble, producing a sequence
of bytes each holding a value from 0 to 15 inclusive.
_prepare_trie ¶
Convert a Trie into the nibble-keyed mapping consumed by
patricialize.
Each value is encoded with encode_node; if the value is an
Account, get_storage_root must be supplied to provide its storage
root. Keys are hashed with keccak256 when the trie is secured, then
expanded into nibble form via bytes_to_nibble_list.
def _prepare_trie(trie: Trie[K, V], get_storage_root: Optional[Callable[[Address], Root]]) -> Mapping[Bytes, Bytes]:
| 419 | <snip> |
|---|---|
| 435 | from ethereum.state import Account, Address |
| 436 | |
| 437 | mapped: MutableMapping[Bytes, Bytes] = {} |
| 438 | |
| 439 | for preimage, value in trie._data.items(): |
| 440 | if isinstance(value, Account): |
| 441 | assert get_storage_root is not None |
| 442 | address = Address(preimage) |
| 443 | encoded_value = encode_node(value, get_storage_root(address)) |
| 444 | elif value is None: |
| 445 | raise AssertionError("cannot encode `None`") |
| 446 | else: |
| 447 | encoded_value = encode_node(value) |
| 448 | if encoded_value == b"": |
| 449 | raise AssertionError |
| 450 | key: Bytes |
| 451 | if trie.secured: |
| 452 | # "secure" tries hash keys once before construction |
| 453 | key = keccak256(preimage) |
| 454 | else: |
| 455 | key = preimage |
| 456 | mapped[bytes_to_nibble_list(key)] = encoded_value |
| 457 | |
| 458 | return mapped |
root ¶
Compute the root hash of trie.
The trie is patricialized into a tree of InternalNodes; the root node
is RLP-encoded and hashed with keccak256 to produce the fixed-size
Hash32 used in block headers.
get_storage_root is required when encoding Account values, so it
must be supplied when computing the state root.
def root(trie: Trie[K, V], get_storage_root: Optional[Callable[[Address], Root]]) -> Root:
| 465 | <snip> |
|---|---|
| 480 | from ethereum.state import Root |
| 481 | |
| 482 | obj = _prepare_trie(trie, get_storage_root) |
| 483 | |
| 484 | root_node = encode_internal_node(patricialize(obj, Uint(0))) |
| 485 | if len(rlp.encode(root_node)) < 32: |
| 486 | return keccak256(rlp.encode(root_node)) |
| 487 | else: |
| 488 | assert isinstance(root_node, Bytes) |
| 489 | return Root(root_node) |
patricialize ¶
Recursively build the trie for obj, starting level nibbles into the
keys.
The structure of the returned subtree is determined by inspecting the
keys present at the current level:
With no keys, the subtree is empty and
Noneis returned.With a single key, a
LeafNodeholds the remaining nibbles and the value.When every key shares a non-empty prefix at this level, an
ExtensionNodeconsumes the prefix and the algorithm recurses for the rest.Otherwise the keys are partitioned by their next nibble into up to sixteen groups, each recursively patricialized, and combined into a
BranchNode.
def patricialize(obj: Mapping[Bytes, Bytes], level: Uint) -> Optional[InternalNode]:
| 495 | <snip> |
|---|---|
| 516 | if len(obj) == 0: |
| 517 | return None |
| 518 | |
| 519 | arbitrary_key = next(iter(obj)) |
| 520 | |
| 521 | # if leaf node |
| 522 | if len(obj) == 1: |
| 523 | leaf = LeafNode(arbitrary_key[level:], obj[arbitrary_key]) |
| 524 | return leaf |
| 525 | |
| 526 | # prepare for extension node check by finding max j such that all keys in |
| 527 | # obj have the same key[i:j] |
| 528 | substring = arbitrary_key[level:] |
| 529 | prefix_length = len(substring) |
| 530 | for key in obj: |
| 531 | prefix_length = min( |
| 532 | prefix_length, common_prefix_length(substring, key[level:]) |
| 533 | ) |
| 534 | |
| 535 | # finished searching, found another key at the current level |
| 536 | if prefix_length == 0: |
| 537 | break |
| 538 | |
| 539 | # if extension node |
| 540 | if prefix_length > 0: |
| 541 | prefix = arbitrary_key[int(level) : int(level) + prefix_length] |
| 542 | return ExtensionNode( |
| 543 | prefix, |
| 544 | encode_internal_node( |
| 545 | patricialize(obj, level + Uint(prefix_length)) |
| 546 | ), |
| 547 | ) |
| 548 | |
| 549 | branches: List[MutableMapping[Bytes, Bytes]] = [] |
| 550 | for _ in range(16): |
| 551 | branches.append({}) |
| 552 | value = b"" |
| 553 | for key in obj: |
| 554 | if len(key) == level: |
| 555 | value = obj[key] |
| 556 | else: |
| 557 | branches[key[level]][key] = obj[key] |
| 558 | |
| 559 | subnodes = tuple( |
| 560 | encode_internal_node(patricialize(branches[k], level + Uint(1))) |
| 561 | for k in range(16) |
| 562 | ) |
| 563 | return BranchNode( |
| 564 | cast(BranchSubnodes, assert_type(subnodes, Tuple[Extended, ...])), |
| 565 | value, |
| 566 | ) |