- Scaling Ethereum is an unprecedented challenge, Plasma is a Layer 2 scaling framework that aims to increase transaction throughput
- This series is a technical overview of Plasma technology: what it is, how it works, and the current state of its technical research
This piece is the first entry to a multi-part series explaining Plasma, by Daniel Goldman.
In case you haven’t heard, scaling cryptocurrency is hard.
In August of 2017, Vitalik Buterin and Joseph Poon released the Plasma whitepaper, unleashing into the world a new, promising approach to increase crypto transaction throughput and deliver us from blockchain-congestion evil. Seemingly overnight, Plasma became the most hyped-up Layer 2 scaling framework in the Ethereum ecosystem, the hype bringing with it the dizzying chaos we’ve now come to expect from the crypto space—bold promises, ambitious research, and a plethora of variants/proposals/counter-proposals/optimizations so multitudinous it’s basically become a running joke.
Which is well exciting and good, but unfortunately, Plasma’s fast-paced evolution, along with its technical complexity, has made it all but impossible for those not directly involved with the research and development to get a handle on things. Peer in from the outside, and you’ll likely be left with more questions than concrete answers: what capabilities can we realistically expect Plasma chains to have? What obstacles do we still face before seeing Plasma effectively operating in the wild? What are its trade-offs? How does it work?
And first and foremost: just what the hell is it?
If those questions pique your interest, you’ve come to the right place! This series aims to provide an overview of Plasma technology — what it is, how it works, and what the current state of the technical research can tell us. Part one will cover:
- the theoretical underpinnings of Plasma as a Layer 2 technology
- explain the inner workings of the first concrete Plasma specification ‘Minimum Viable Plasma’
- Plasma Cash, the variant specification that’s attracted much of the Plasma R&D since ‘MVP’
Obligatory disclaimer: while no serious technical background should be necessary to follow along, some basic understanding of Ethereum and smart contracts will be assumed. Plan accordingly.
Background: Layer 2
The family of protocols we call “Plasma” represents a subset of Layer 2 solutions to the blockchain scalability problem, the “problem” here essentially being the limited transaction capacity that an open, unpermissioned blockchain can handle. Layer 2 seeks to circumvent this bottleneck by allowing for (some) transactions to be considered finalized without them ever having to touch the blockchain itself.
If you’d like, you can think of a Layer 2 transaction as a check whose account’s funds you can verify directly, without necessarily having to actually deposit it into your bank account. This check could then effectively be treated as paper currency and directly handed off to another party as payment, provided that the next party also gains ability to verify the account’s funds for themselves (rough analogy, please no nitpicking yet.)
The general pattern of Layer 2 systems is: initially, some capital is locked up on the blockchain’s base layer (we’ll assume Ethereum from here on). Next, some parties (and not necessarily the same parties who made the deposit) can then transact off-chain with this capital via an overlay system, while only interacting with the mainchain occasionally (if ever). At any given point, the proper owner of any capital has assurance in their ability to withdraw all funds they own back onto Layer 1.
The defining property that distinguishes Layer 2 (as we’re using the term) from other off-chain payment systems is that despite avoiding constant base-layer interaction, Layer 2 transactions still preserve all of the decentralized, trustless security guarantees that we expect from Layer 1. By securing your private keys and running the requisite software, you can guarantee custody of your own funds, regardless of the actions or inactions of any counterparties — “counterparties” here being other individuals, institutions, consensus mechanisms, or really anything else that’s outside of your control, save for the mainchain itself. Even in a nightmarish, conspiratorial, Truman Show-esque scenario where all other users of the system are secretly colluding to try and steal your money, they’ll fail.
What Makes Plasma Plasma
In the past few years of Layer 2 R&D, a taxonomy has slowly but surely emerged that lets us neatly partition Layer 2 mechanisms into one of two categories: “Plasma” or “channels” (as in “state channels” or “payment channels”). While not everyone uses these terms precisely this way — and ultimately, neither crypto nor language itself has a high council to officially settle these definitions for us — we’ll hereby assume broad enough definitions of “Plasma” and “channels” such that the two encompass the totality of all possible Layer 2 systems.
One way to delineate these two categories is by the minimum on-chain transactions they require: for a channel transaction to be considered finalized, no interaction with the mainchain is strictly necessary; for a Plasma transaction, one interaction with the mainchain is strictly necessary (broadcast by the Plasma operator, not the users, as we’ll see). The reason Plasma still qualifies as a Layer 2 scaling approach (despite requiring regular on-chain transactions) is that each Layer 1 transaction can effectively finalize many transactions in one fell swoop; you can imagine that a bundle of Layer 2 transactions are compressed down into one. Still, this itself seems to be an ipso facto plus-side to channels; no on-chain block confirmations required means (virtually) instant finality, and less on-chain interaction is generally a good thing.
On the flip side, a channel requires full consent of all of its participants for any channel-wide state update, which means that having a single channel with many parties gets highly impractical. Transacting with parties with whom you don’t share a channel requires “relaying” transactions through your channel-partners, limiting your financial activity to those with whom you can find these liquidity paths through the channel network’s graph. In Plasma, however, only a transaction’s sender needs to give consent, and no liquidity lock-ups / restrictions are required for all parties involved to enter, exit, and freely transact with each other (the question of “who needs to give consent for state updates,” is, it turns out, an equivalent way to delineate the “channels” vs. “Plasma” dichotomy).
So at face value, one could say that channels are the appropriate mechanism for applications that benefit from instant finality and where a small, relatively fixed set of participants can be expected to interact, whereas Plasma is most useful of cases where many parties are involved and high transaction throughput is paramount, with immediate finality being less important.
As we’ll see in later installments, there are constructions which utilize both channel and Plasma mechanisms and try to capture as much of the best of both worlds as possible, so we may ultimately not need to compromise as much as it may seem. But let’s not get ahead of ourselves — before we start getting too fancy, we first need to grasp how Plasma actually works.
Minimum Viable Plasma
While the original whitepaper introduced the general notions of Plasma, it was also broad and wildly ambitious (and long, frankly); some of the ideas it floats — a tree of nested Plasma chains, for example — are currently still out of the scope of any current Plasma research and may ultimately not even be possible.
Thus, the first big step towards getting actual working code — and, arguably, the starting point for the way we currently think about Plasma — was a spec known as Minimum Viable Plasma (MVP). As the name suggests, the goal here is to filter out all fancy features and distill things down to the simplest possible working implementation. “Working” here means it must simply satisfy the fundamental requirements of a Layer 2 Plasma system as defined above, and with only minimal functionality — namely, A to B payments of some fungible asset (we’ll assume Ether from here on, but it works just the same with any ERC20-compliant token).
For the time being (and only for the time being!), we’ll ignore any other downsides that emerge, even if said downsides deeply suck. And indeed, MVP does qualify as a functional Plasma solution! Although the downsides… well, you decide.
As we saw earlier, the key property of Plasma is that many transactions are compressed down and finalized with only one transaction landing on the mainchain. In MVP, the “compression” is done via a Merkle tree; transactions are grouped together and Merklized down into a root, which is all that needs to be put on Layer 1. The transactions themselves follow a Bitcoin-esque UTXO model; i.e., they spend from inputs of which the sender proves ownership, and create new outputs encumbered with the public address of their new owners. Which is to say: a Plasma chain is itself a blockchain! Using the requisite off-chain Plasma block data and the on-chain Merkle root data, users can verify ownership of what’s rightfully their own, relying on the smart contract on the Ethereum chain to enforce the rules and settle all disputes.
We’ll refer to the entity that is responsible for the procedure described above — i.e., Merklizing the transactions, broadcasting the root, and sharing the data with users — the Plasma “block producer,” essentially — as the Plasma Operator. It’s worth noting that the Plasma mechanism itself is completely agnostic as to what form this Operator takes; it could be a single, “centralized” entity, a federated sidechain, a proof of stake based block attestation system, etc. The fundamental goal of the Plasma construction is for all fund management to be non-custodial, and if we can get Plasma to work, this factor should remain the same regardless of who’s producing blocks for us. The more “decentralized” mechanisms might well offer other benefits of the sort we associate with distributed, peer to peer systems, i.e., censorship resistance, fault tolerance, etc. — but the non-custodial-ness remains the same. Thus, for simplicity’s sake, we’ll assume the Operator is simply a single entity, letting us reason more explicitly about the Plasma mechanism itself.
With that, we can start to go through the life-cycle of a typical Plasma transaction, and examine how things are handled in different possible scenarios.
First, Alice deposits Ether into our Plasma chain by sending an on-chain Ether transaction to the contract, which the Operator includes in a Plasma block; this Ether initially belongs to Alice (obviously) in the form of a UTXO. Alice, as usual, wants to pay Bob, (note that Bob himself does not necessarily need to have made any on-chain deposits himself yet, or ever.) To do this, she creates a transaction that spends her UTXO and creates a new one for Bob, and sends this transaction to the Plasma operator. The operator takes this transaction along with a bundle of other (feasibly unrelated) transactions, groups them together into one Plasma block, “Merklizes” them down to their Merkle root, and sends this root — and only this root — onto the main chain.
The operator then sends this Plasma block to all users (including Alice and Bob). Upon receiving the latest block, Alice and Bob validate it on their end; this validation entails ensuring that the transactions themselves are valid and that the block corresponds to the on-chain Merkle root. If all of this checks out, Alice, Bob, and all of the other users can then go happily on with their lives.
Later on, Alice — feeling she’s had herself enough of this crazy Plasma business, say — decides she wants to withdraw her funds back onto the Ethereum chain. She initiates this “withdrawal request” via an on-chain transaction (n.b: this withdrawal request does not require the Operator’s permission.) In her transaction, she includes the Plasma chain’s UTXO she would like to withdraw, along with the Plasma block number it belongs to and the Merkle path proving inclusion. Now, before she has access to her funds, she must wait for the “dispute period” (one week, let’s just say) to pass. During this period, other users can challenge her exit if they detect foul play.
Which brings us to:
The Happy Case: Everyone Behaves
Upon initiating her exit, other users skim through their copy of the Plasma chain to check and confirm that yes, indeed, the UTXO Alice is trying to exit with does in fact still belong to her. They also verify that all of the blocks are valid in every other way (though presumably they’ve already done this). Users can now rest assured that Alice is only departing with money that’s rightfully hers, and other users’ funds are safe. Life can go on.
The Unhappy case: Evil Alice
Now let’s create an alternate ending to the previous scenario: Alice’s exit attempt is “proper enough” to be initially accepted by the smart contract — which is to say, it’s a valid transaction, with a Merkle proof that does indeed correspond with an old Merkle root — but it’s actually a double spend; i.e., she tries to exit with the same UTXO she sent Bob earlier. Alas, Alice.
But no matter! Bob (or any really any other user, but let’s assume Bob) has one week to take action; he’ll check Alice’s UTXO against his copy of the Plasma chain and notice that it’s a double spend. To prove maleficence, he submits a “fraud proof” in the form of the old transaction in which Alice previously spent the UTXO in question, along with a Merkle proof of its inclusion in a Plasma block. Bob has given cryptographic proof that Alice already spent this money; she’s been caught in the act, and her attempt to withdraw this money is cancelled.
A Note on “Punishments”
At this point, we may want some way to further punish Alice for her attempted crime, creating a credible threat of greater loss for her and ideally disincentivizing this sort of behavior from taking place to begin with. These punishment mechanisms are typical in channel constructions; in Poon-Dryja payment channels, for example (the payment channel construction currently used in Bitcoin’s Lightning Network), being caught trying to cash out outdated transactions results in your counterparty collecting all of your channel’s funds. While this sort of punishment isn’t strictly necessary in either Plasma or channels, it’s arguably a stronger requirement in Plasma; without some notion of punishments, Alice could simply repeatedly attempt improper withdrawals, forcing Bob (or someone else) to spend gas fees with every response. Ironically, however, it’s also less self-evident how such punishments could be administered in Plasma; to slash some of Alice’s funds in the Plasma chain, we’d need to establish which funds are hers, which itself would require its own claim/dispute window mechanism, sending us down a recursive, challenges-all-the-way-down rabbit hole.
Thus, Plasma constructions typically require Alice to post an “exit bond” as she attempts her withdrawal. In essence, she says, “I would like to take out 5 Ether, and here’s 1 Ether which you can take from me if my exit proves to be fraudulent.” We’re free to set up the contractual terms — i.e., the required size of the bond, as well as the response case of a violation (give the bond to the successful challenger as bounty, slash it into oblivion, cover only the challenger’s gas costs etc.) — and make them as lenient/Draconian as we please.
The Miserable Case: Evil Operator
So far things have been gone relatively smoothly, largely because we’ve made the hugely simplifying assumption that the Operator has all of our best interests at heart. Now it’s time to think the unthinkable: what if the Operator is an out-and-out liar and a thief?
Say, for example, Alice and Bob are going about their business, when they one day find that the Operator has sent them a block — with its Merkle root notarized on-chain — that includes a blatantly invalid transaction, one that spends, say 90% of all of the Ether available in the Plasma chain; recall that the Operator is entirely in charge of producing blocks, and thus can, in theory, include in them whatever he wants. Worse yet, say that next, the Operator requests an on-chain withdrawal using this invalid transaction.
Unlike Alice’s attempted fraud in the Unhappy Case, which was an attempt at double spending, this transaction spends from a “UTXO” that shouldn’t even exist. Therefore, the method we used previously for challenging and cancelling this withdrawal isn’t available to us.
It’s tempting to think that we can construct a different sort of fraud proof to handle this different flavor of misbehavior, i.e., “What if we demand that the Operator prove the existence of the input UTXO? And, err… if that came from ‘out of nowhere’ too, we demand proof of that one? And then, umm…”
But let me stop you right there. Fraud proof in this case would indeed be complex — and perhaps even impossible — but even exploring this route is pointless and futile, as it ultimately bumps up against a larger issue. Since we’re assuming an evil operator, we might as well go all out — not only does the operator create an invalid transaction and attempt an exit with it, but while doing so, he never even shares the invalid Plasma block with his users. Alice, Bob, and the rest now see a giant, suspicious exit being attempted, but are completely in the dark as to any of the details.
Before we explain the solution, it’s worth pausing to reflect on this point: since our goal is to avoid inheriting any counterparty risk, we cannot assume that the Operator (or anyone else) will be at all reliable in providing any data. And as it turns out, any attempt to somehow force data availability (or, likewise, to prove data unavailability), without simply introducing some new trust assumption, is itself hopeless. This has been dubbed the “speaker/listener fault equivalence” dilemma; in short, if we expect Carol to send Derrick some data, and we find ourselves in a situation in which Derrick is claiming “I never received the data!” and Carol is claiming, “Yes he did, he’s just hiding it now!!”, there’s no objective way for us to determine which of them is lying. Thus, our Plasma threat-model has to include the possibility of the Operator suddenly, without warning, going completely silent.
So now what? How do we ensure that our Operator doesn’t make off with funds that aren’t rightfully his? In this dire circumstance, we resort to MVPs nuclear option: given that we can’t directly prevent the Operator’s exit from taking place, we instead allow everyone else to exit first. The smart contract enforces an exit queue which ensures that earlier UTXOs will be given priority; thus, as long as everyone — that’s right, literally everyone — using the Plasma chain withdraws what’s rightfully theirs before the Operator’s massive withdrawal is complete, all of the Ether for the Operator to steal will be drained, and thus his mendacious attempt to claim more than his share will ultimately be for naught.
Theoretically speaking, this works. Users maintain custody of their funds regardless of any other party’s actions; that was our goal. But let’s be frank: the ability for a malicious Operator (or, perhaps more likely, a hacked/compromised Operator) to — using no more than a single transaction — force all users to stampede onto the mainchain before time runs out is far from ideal. In fact, in the truly worst-case scenario, if the mainchain blocks are congested for long enough, users may be unable to exit in time, and thus could, in fact, literally lose their funds. The number of transactions required for this mass exit could theoretically be reduced with more advanced exit strategies — batching many transaction into one via signature aggregation — but this is still an unsolved research problem, and even a working solution would require coordination and cooperation among the users, and thus would still be sub-optimal.
But there’s another (arguably) even bigger downside to the MVP construction that I’ve been delicately tiptoeing around; even in the happier cases, we require all users to fully verify the entire Plasma chain for themselves. This puts us in a usability Catch-22; Plasma is useful to the extent that it can extend Ethereum’s transaction throughput capacity, but the higher the transactions per second throughput the Plasma chain offers, the heavier burden— in terms of bandwidth and memory.— it puts on the users’ client-side software. Recall that a key benefit to the Plasma construction is the ease at which it can support many users; restricting it to only those who can run a heavy-duty application seriously undercuts its value proposition.
Surely we can do better… right?
Towards Doing Better
Returning to the forced mass-exit vulnerability: viewed a certain way, the source of the “problem” stems from the fungibility of the asset in question. Say Bob is “owed” 5 Ether; since Ether is a fungible asset, it makes no sense to talk about “which particular 5 Ether” he’s owed. Thus, in our Miserable Case, when the Operator exits with more than his fair share, there’s no coherent answer to the question, “exactly whose Ether is he stealing?” Ether is Ether, so he’s drawing from the collective Plasma Ether Pot, which is, in a sense, the reason all users need to then exit with their funds.
So then what if there was some way of specifying the owners of each particular “piece” of Ether? Instead of the total Ether balance in the Plasma contract being represented as one (large) number, imagine that the balance is the sum total of all of the denominations of a big pile of indivisible “Ether bills.” Might this solve our problems? (And might this introduce new ones?)
Enter Plasma Cash!
Plasma Cash is a variant on the Plasma construction which has become the foundation of much of the research in the Plasma community since. It takes a similar “minimum viable” approach as we saw in MVP, but begins with a new restriction: all of the assets on a Plasma Cash chain are non-fungible tokens. One of these NFTs (we’ll call them “coins”) could represent anything: fixed denominations of Ether or an ERC-20 token, a bundle of ERC-20 tokens, a litter of cryptokitties, rights to become the Operator of the Plasma chain for 100 blocks (disclaimer: haven’t thought through the implications of this one, do not try at home), etc. The only requirement is that it can be represented as an ERC-721 asset — which essentially just means it is something unique that can’t be split or merged.
Plasma Cash ditches the Bitcoin-esque UTXO transaction model of MVP; with non-fungible coins, the notion of creating new transaction outputs no longer applies. Instead, each coin is accounted for in each Plasma block: presence of a coin indicates that the coin changed owners in that Plasma block (i.e., Alice sent it over to Bob); absence indicates that it still has the same owner as it did in the previous block. Thus, the full history of a coin can be described by its absence or presence in each Plasma block, from the current block all the way back to the block in which it was first deposited.
As we’ll soon show, the full history of a coin is enough information for its holder to secure ownership, and nicely enough, under this model, this history need not include all of the data in each Plasma block: In order for Bob to prove the presence of his coin in a given block, he only needs the transaction’s Merkle path. However, to prove that the coin was not transferred in a given block, Bob requires the ability to prove absence of data, a feature not supported by the Merkle Trees we know and love.
Thus, to enable this “proof of absence” capability, Plasma Cash uses a souped-up Merkle Tree construct known as a Sparse Merkle Tree. SMTs are Merkle trees with an additional, special feature: the leaves of the tree (the coins, in our case) are each given a unique identification number which determines where in the tree they reside. Essentially, a sorted ordering is imposed on them; each coin can only reside in its allotted “slot.” What this means is that if a coin is absent, we know where it would be if it were present, and thus, we are able to prove its absence with a Merkle branch that shows that its slot is “empty” (i.e., equal to some null value — zero, “undefined,” whatever).
So now that we can prove both absence and presence of a coin in each block, we can track a coin’s full history, which might look something like this:
And so on. The key takeaway being that the data requirement for proving this history consists of only one bite-sized Merkle proof per block, as opposed to MVP’s full Plasma blocks requirement. Plasma light client secured!
Let’s now explore the claim that this sequence of Merkle proofs is enough for the coin’s current owner to secure their funds. In other words, as long as Steve has the full history of his coin (in the form described above), he has objective assurance that:
1) If/when he tries to withdraw, he will have the appropriate response to any challenges.
2) If/when anyone else tries to withdraw his coin, he’ll be able to challenge and successfully overturn the withdrawal.
In the example above, Steve, by block 505, is the coin’s proper owner. This first important point to underscore here is that in order for him to have considered his payment finalized (in his case, the payment he received from Bill in block 504 ), he first had to receive and verify the coin’s full history (i.e., validated the Merkle proof in every preceding block). Then, and only then, could he truly claim ownership.
Let’s look at the the sorts of things that can happen and how Steve can respond (some of this should feel familiar from MVP):
Block 504 is where Steve’s ownership of the coin is established, so there are basically two ways to try to steal from him: someone can claim ownership of the coin in a prior block or in a subsequent block.
So say, for example, Alice tries to claim the coin by broadcasting her “proof of presence” of the coin in block 501; in doing so, Alice is implicitly attesting that the transfer in block 501 is the latest valid coin transaction, i.e., that she hasn’t spent the coin since. Steve responds essentially the same way Bob did in the MVP Unhappy Case; he shows Alice’s transfer to Bill in block 502 (note that Steve can’t forge this proof, since the transaction includes Alice’s signature.) Alice is caught in the act, and her exit is cancelled. Again, since Steve has the coin’s full history, he knows he has a response ready at-hand for any such fraudulent exit attempt.
Now suppose Alice tries to exit her coin in block 506, i.e., after the block that establishes Steve’s ownership. Note that for her Merkle proof of inclusion to even get accepted by the smart contract (i.e., for Alice to even get this far), the Operator will have had to have maliciously and fraudulently included an invalid payment in Block 506; but as usual, we’re assuming the Operator can be arbitrarily malicious.
In this case, Steve responds by providing his transaction in block 504; here, he’s showing that he owned the coin in that block and is implicitly claiming that he hasn’t spent the coin since. Now the onus is on Alice to provide a transaction in which Steve sent the coin to another party, but Steve knows she won’t be able to do so, since he knows that he didn’t in fact spend the coin (and that he, of course, has been protecting his private keys with his dear life). So again, Alice’s exit is canceled.
While there are some trickier edge cases hidden underneath the examples above (which you’re welcome to work through as a take-home assignment) these exit games capture the gist of how Plasma Cash enforces its rules.
But wait, there’s more! Let’s revisit what we were forced to call the Miserable Case back in MVP; the Operator stops providing block data, and then proceeds to attempt an invalid withdrawal. Well this time, in Plasma Cash, the withdrawal consists of a single coin. Thus, only the actual owner of that particular coin (and this person knows who they are) needs to take action; with the non-fungibility restriction, there’s no more risk of the withdrawn funds “spilling over” into any other user’s slice of the on-chain-assets pie. This brings the forced withdrawal risk at parity with that of a channel hub; if the Plasma chain is inoperable, users will (presumably) want to exit eventually, but there’s no longer any panic mode, time-sensitive, forced mass-exit fiasco.
So let’s take a moment to realize our gains: we’ve upgraded our MVP Plasma construction such that now users can run light clients and the ugly mass exit vulnerability is essentially eliminated.
Does that mean we’ve found the Plasma silver bullet we’re after?
Not quite. First off, although we’ve astronomically minimized the amount of data each user needs to handle, it’s actually still quite a bit; some quick napkin math shows that as a coin stays in the Plasma chain for a significant amount of time, one Merkle Proof per coin per Plasma block starts to get pretty hefty; plus, remember that this full history needs to be transferred to the recipient each time a payment is made.
But moreover, let’s not forget the initial restriction we imposed to get started with Plasma Cash — non-fungibility. This may well give us enough functionality for certain applications — NFT marketplaces, say — but ultimately, we want Plasma assets to feel and act like money. And being able to make payments in whatever denomination you please is pretty fundamental to money; without it, a payment network on a Plasma Cash chain may start to feel like trying to split the check at a cash-only restaurant when nobody has any singles.
On Part 2 we’ll continue the journey, and discuss the approaches that build on top of Plasma Cash for minimizing the history data-bloat problem, and, most of all, rescuing payment fungibility, while still retaining Plasma Cash’s upsides.
Daniel Goldman is a software engineer, technical consultant, and independent writer. He is based in Brooklyn, New York.
Thanks to Georgios Konstantopoulos for his valuable feedback.