[Keynotes] Possible futures of the Ethereum protocol, part 5: The Purge

Alireza Mortazavi
3 min readOct 31, 2024

--

This article addresses Ethereum’s ongoing challenge of reducing “bloat” and complexity in its blockchain to ensure long-term scalability, sustainability, and usability. Two major contributors to this problem are:

  1. Historical Data Storage: Every Ethereum node stores an ever-growing history of all transactions, accounts, and blocks, which requires more disk space over time, currently around 1.1 TB for a full node.
  2. Protocol Complexity: The addition of new features without removing outdated ones leads to a more complicated codebase.

To manage these issues, Ethereum developers propose “The Purge,” which includes:

  • Reducing client storage: This would minimize the data each node has to store permanently, allowing nodes to offload historical data after a period.
  • Simplifying the protocol: By removing unnecessary features, Ethereum could reduce code complexity and maintenance requirements.

Key Solutions and Proposals

  1. History Expiry: Ethereum currently keeps recent history (e.g., consensus blocks for ~6 months). EIP-4444 proposes storing historical data for only one year, after which it would be moved to a distributed storage network, where each node only stores a fraction of the historical data. This approach is similar to torrent networks, ensuring each piece of history remains widely accessible without burdening each node with full data retention.
  2. Distributed Storage Network: Ethereum’s Portal network or even standard torrent technology could serve to store older data in a distributed manner, with erasure codes enhancing data resilience.
  3. Trade-offs: While storing less history simplifies node operation, Ethereum risks compromising its strength as a permanent record if history storage becomes optional. Two main approaches are proposed: (1) requiring validators to store historical data, ensuring robustness, or (2) allowing voluntary storage of history as a standard, which reduces storage demands but makes ancient data less accessible.
  4. Interaction with the Ethereum Roadmap: Limiting historical storage would make nodes faster to set up and easier to operate, opening possibilities for lightweight nodes on devices like smartwatches. Furthermore, it allows node software to discard outdated code, reducing complexity further.

Overall Impact

The Purge aims to maintain Ethereum’s decentralization and security while minimizing data bloat, enhancing scalability, and simplifying node maintenance. With these changes, Ethereum could support more efficient, decentralized applications that don’t rely on central intermediaries, preserving blockchain permanence while reducing operational burdens.

State expiry

State Expiry Problem: Ethereum’s state grows by about 50 GB yearly. This burdens clients with permanent storage requirements, unlike historical data which can be pruned.

Challenges:

  • Efficiency: Minimize extra computation for expiry.
  • User-Friendliness: Ensure users retain access to assets even after long absences.
  • Developer-Friendliness: Maintain familiar programming models.

Proposed Solutions:

  • Partial State Expiry: Divide state into chunks, storing only recently accessed data. Expired data is replaced with a stub, requiring proof to resurrect.
  • Address-Period-Based Expiry: Use periodic state trees, with older states requiring proof for access, allowing data to be saved in newer trees.

Address Space Considerations

  • Expansion: Introduce a new 32-byte address format for backward compatibility.
  • Contraction: Limit address space to prevent collision risks, affecting counterfactual addresses.

Future Paths

  1. Stateless Clients: Allow only specialized users to store state, reducing overall burden.
  2. Partial State Expiry: Accept slow state growth.
  3. State Expiry with Address Expansion/Contraction: Address backward compatibility and security issues.

Simplification Efforts

Efforts are underway to reduce protocol complexity, such as removing unused features like the SELFDESTRUCT opcode and transitioning data formats from RLP to SSZ.

You can read the full article on Vitalik Buterin website.

--

--

No responses yet