You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a contract deployed at some address addr that has a non-empty storage trie s is destroyed at some block n and then a contract (same or different) is redeployed at addr (using CREATE2 opcode) at some block m if we perform the query currently used in RetrieveStorageAtByAddressAndStorageSlotAndBlockHash at some block m+x; where x>=0 it will behave as if the new contract inherited s as its initial state.
In particular, if the specific storage leaf key being queried doesn't exist in the redeployed contract's storage trie at m+x but it did exist in s, the query will incorrectly return the value held at that key in s instead of an empty value.
Fixes
During statediffing
This could be fixed on the indexing end, by adjusting the statediffing code to iterate s and insert a "removed" node-type entry to eth.storage_cids at block n for every node in the trie when generating the statediff between n-1 and n. This would be done here. Having to iterate the entire storage trie is potentially very expensive, and this wouldn't fix the data that has already been processed into Postgres.
Adapt the query
Alternatively, the query used here could be adjusted to account for this issue.
Create a stored function (e.g. was_state_leaf_ever_removed) for checking if a node has ever been removed for a specific state_leaf_key, returning the blockheight(s) it was removed at if so.
Create another stored function (e.g. retrieve_storage_by_address_and_storage_slot_and_blockHash) that first calls was_state_leaf_ever_removed to check if the storage key was ever removed. If it wasn't, it proceeds with the current access pattern. If it was, it finds the height at which it was (most recently) removed and limits its query to eth.storage_cids records above this block height.
This would add additional overhead to the query. This also only fixes this symptom, not the underlying issue. This issue will produce a similar symptom when, for example, trying to create a new storage snapshot from a previous snapshot + all the interim diffs and we would require an adapted query for that as well.
Post-processing of data in Postgres
This issue could also be fixed in post-processing, the algorithm for this would look like:
Query the database to find "Removed" node-type eth.state_cids records that correspond to a contract destruction
That gets you all removed leaf nodes and the block heights they were removed at, but that includes EOAs and contracts.
To figure out if the removed leafs contained contracts, you need to find the latest non-removed-type
entry for those state_leaf_keys and check if they contain a non-"zero" storage root ("zero" value of root is keccak256(rlp([]byte{}))).
Find the latest eth.storage_cids record at every storage_path in the trie for those contracts
SELECT DISTINCTON (storage_path) storage_cids.*FROMeth.storage_cidsINNER JOINeth.state_cidsON (storage_cids.state_path=state_cids.state_path) AND (storage_cids.header_id=state_cids.header_id)
INNER JOINeth.header_cidsON (state_cids.header_id=header_cids.block_hash)
WHEREstate_cids.state_leaf_key= $1WHEREstorage_cids.node_type!=3# can ignore nodes whose latest state is already "removed"AND block_number < $2ORDER BY storage_path, block_number DESC;
$1 = state_leaf_key for removed contract, found in step 1
$2 = block_number where the contract was destroyed, found step 1
At the blockheight the corresponding contract was destroyed, create new eth.storage_cids records with node_type = 3 for all of the records found in the above query.
Proposal
Make the updates described in 1) to fix newly indexed data. Use the approach described in 3) to fix the existing data in Postgres.
These contract destructions aren't very frequent, so the performance impact on the head-tracking process should be minimal in aggregate but it could introduce significant latency in the processing of the specific statediffs where large contract destructions do occur. Due to the infrequency, the historical processing required would be sparse and with the index on node_type it should be fast to find the (relatively few) contracts we need to fix the issue for.
The text was updated successfully, but these errors were encountered:
i-norden
changed the title
Storage nodes are not being marked as removed when the contract, as a whole, is destroyed.
Storage nodes are not being marked as removed when the contract is destroyed.
Mar 10, 2022
Ashwin and his team have identified the problem described here extends to the storage trie. To alleviate this, we need to introduce a new "moved" node type for representing, in our PostgresDB, leaf nodes that have been moved to a new path vs leaf nodes that have been removed altogether.
Closing this as we no longer need to post-process v4 as we are moving to v5, and the missing integration test scenario is no longer relevant in v5 since we don't index intermediate nodes so there would be no need to distinguish between "moved" vs "removed", although we do still need better testing in place for when accounts or slots are removed e.g. cerc-io/tx-spammer#18
Discovered by Ashwin and his team.
This causes a bug in ipld-eth-server:
If a contract deployed at some address
addr
that has a non-empty storage tries
is destroyed at some blockn
and then a contract (same or different) is redeployed ataddr
(using CREATE2 opcode) at some blockm
if we perform the query currently used in RetrieveStorageAtByAddressAndStorageSlotAndBlockHash at some blockm+x; where x>=0
it will behave as if the new contract inheriteds
as its initial state.In particular, if the specific storage leaf key being queried doesn't exist in the redeployed contract's storage trie at
m+x
but it did exist ins
, the query will incorrectly return the value held at that key ins
instead of an empty value.Fixes
During statediffing
This could be fixed on the indexing end, by adjusting the statediffing code to iterate
s
and insert a "removed" node-type entry toeth.storage_cids
at blockn
for every node in the trie when generating the statediff betweenn-1
andn
. This would be done here. Having to iterate the entire storage trie is potentially very expensive, and this wouldn't fix the data that has already been processed into Postgres.Adapt the query
Alternatively, the query used here could be adjusted to account for this issue.
was_state_leaf_ever_removed
) for checking if a node has ever been removed for a specificstate_leaf_key
, returning the blockheight(s) it was removed at if so.retrieve_storage_by_address_and_storage_slot_and_blockHash
) that first callswas_state_leaf_ever_removed
to check if the storage key was ever removed. If it wasn't, it proceeds with the current access pattern. If it was, it finds the height at which it was (most recently) removed and limits its query toeth.storage_cids
records above this block height.This would add additional overhead to the query. This also only fixes this symptom, not the underlying issue. This issue will produce a similar symptom when, for example, trying to create a new storage snapshot from a previous snapshot + all the interim diffs and we would require an adapted query for that as well.
Post-processing of data in Postgres
This issue could also be fixed in post-processing, the algorithm for this would look like:
eth.state_cids
records that correspond to a contract destructionThat gets you all removed leaf nodes and the block heights they were removed at, but that includes EOAs and contracts.
To figure out if the removed leafs contained contracts, you need to find the latest non-removed-type
entry for those
state_leaf_key
s and check if they contain a non-"zero" storage root ("zero" value of root iskeccak256(rlp([]byte{}))
).eth.storage_cids
record at everystorage_path
in the trie for those contracts$1 =
state_leaf_key
for removed contract, found in step 1$2 =
block_number
where the contract was destroyed, found step 1eth.storage_cids
records withnode_type = 3
for all of the records found in the above query.Proposal
Make the updates described in 1) to fix newly indexed data. Use the approach described in 3) to fix the existing data in Postgres.
These contract destructions aren't very frequent, so the performance impact on the head-tracking process should be minimal in aggregate but it could introduce significant latency in the processing of the specific statediffs where large contract destructions do occur. Due to the infrequency, the historical processing required would be sparse and with the index on
node_type
it should be fast to find the (relatively few) contracts we need to fix the issue for.The text was updated successfully, but these errors were encountered: