How does Narwhal prevent two validators from including identical transactions in their blocks?

I already ask this question on Stackoverflow but I think it might be quicker if I ask here as well.
I am studying how Sui work by reading Narwhal and Bullshark paper. However, I can’t understand how validators interpret their DAG as a list of blocks. In BFT, like Tendermint, a leader is the only validator that proposes a block at each round, so other validators can agree on that block and store it in their storage without worrying that it will have duplicate transactions in the block. On the other hand, in DAG like Narwhal, they say in normal blockchain mempool:

A transaction submitted to one validator is gossiped to all others. This leads to fine-grained double transactions: most transactions are shared first by the mempool, and then the miner/leader creates a block that re-shares them.

So to prevent this Narwhal, reduce the need for double transmission when leaders propose blocks by broadcasting blocks instead of transactions and letting the leader propose a hash of a block, relying on the Mempool layer to provide its integrity-protected content.

Here, I am confused like hell. It makes sense if we have one leader, so when a leader is chosen, we can just choose the leader block at that round and discard other blocks in the same round. However, I think they don’t do that. For example, in Bullshark, the leader block is an anchor, and they also commit other blocks besides the anchor using some deterministic rule.

My questions are:

  1. If all the blocks in the same round get ordered with some deterministic rule, how can they ensure that there are no duplicate transactions in different blocks in that round? Because malicious actors can just send duplicate transactions to many validators, and each validator shares blocks of transactions instead of transactions themselves. If these blocks get committed, even though they can ignore the duplicate transactions with some rules, the space in those blocks will be wasted.
  2. Can anyone give an example of the deterministic rule that is used to order blocks between two anchors?
  3. Does this mean that the system achieves higher throughput when more validators join the consensus because there are more blocks at each round that are proposed?
6 Likes

I think you are confusing blocks and transactions here.

What Narwhal tries to make things go super fast. The two ways this is done is by 1) Compression 2) Not caring of errors (simplified).

Compression - you dont communicate full blocks, just hashes, which is way more efficient.
Spamming/Creating errors - you just go brrr fast, create blocks, and discard blocks that have overlapping transactions.

Does that make sense? Basically everyone just spams compressed blocks, checks for errors, and thus the amount of time spent on waiting or gossiping transactions is minimized.

3 Likes

Here, I am confused like hell. It makes sense if we have one leader, so when a leader is chosen, we can just choose the leader block at that round and discard other blocks in the same round. However, I think they don’t do that. For example, in Bullshark, the leader block is an anchor, and they also commit other blocks besides the anchor using some deterministic rule.

Hey, thanks for asking - this is George from mysten. This is a tricky bit of how Bullshak is used which we do not discuss much.

So transactions are disseminated in Narwhal blocks, that link to past blocks. A leader is nominated every 2 rounds in a predefined way. And Bullshark is just a rule about when the leader block is committed or skipped (in case the leader is dead or malicious). So at the end of all this we have indeed a sequence of leader blocks that are committed.

So I understand your questions as being: how do we go from these leaders blocks (which are at best one every 2 rounds, and contain 1/100 transactions even for this round assuming 100 validators) to actually sequencing a vast majority of transactions submitted by any leader?

Ok, so note that blocks link to previous blocks by containing their hashes. So when we commit a leader block we commit all blocks in its history that are not already committed.

To be concrete lets look at the diagram you shared. Consider that Bullshark tells us to commit first A1 then A3. When A3 is committed we can commit: V1 round 1-4, V2 round 2-4, V3 round 1-5 and V4 round 1, since they are all linked from A3. Note that A1 (V2 round 1) is already committed so we leave it alone.

  1. If all the blocks in the same round get ordered with some deterministic rule, how can they ensure that there are no duplicate transactions in different blocks in that round? Because malicious actors can just send duplicate transactions to many validators, and each validator shares blocks of transactions instead of transactions themselves. If these blocks get committed, even though they can ignore the duplicate transactions with some rules, the space in those blocks will be wasted.

Ok, the only parties that send blocks are validators. Due to the operation of Sui many validators can indeed receive a transaction certificate to be sequenced. What we do to avoid duplicates is that each validator takes turns sending the transaction to the consensus according to the transaction ID, only if they have to seen it already. So a transaction X will first be submitted by say V1 then after some time V2, then after some time V3, etc. Each transaction has its own sequence that is pseudorandom.

In case a validator starts spamming with transactions we can detect that – but right now I am not overly concerned by this. Sending very large blocks compared to others is as much a DoS against the malicious validator as the others, and their blocks become less able to be in the DAG if they overdo it.

  1. Can anyone give an example of the deterministic rule that is used to order blocks between two anchors?

To be as concrete as possible the magic happens here:

In a nutshell Sui takes all transactions from all blocks committed, removes duplicates and transactions already processed, orders them by gas price (high first) and that is the sequence! This is deterministic for a stable sort and leads to the same sequence for all validators.

  1. Does this mean that the system achieves higher throughput when more validators join the consensus because there are more blocks at each round that are proposed?

All blocks need to be sent to all validators. So if there are fewer validators the blocks can be larger, and if there are more validators they can be smaller. But ultimately the sum of sizes of all blocks from all validators in a round have to fit in the network capacity of all validators - so adding more validators in the extreme does not help. Increasing the network capacity of each validator is necessary.

Hope the above helps!

2 Likes

I have searched for many resources but cannot find the answers. Thank you, Georg, for the answers! :grinning: This helps clarify my questions.

1 Like

Hi georged,

Thank you for the answer! Help me clarify a lot. But, I still confused about Q1’s answer (handling duplicate).

What we do to avoid duplicates is that each validator takes turns sending the transaction to the consensus according to the transaction ID, only if they have to seen it already. So a transaction X will first be submitted by say V1 then after some time V2, then after some time V3, etc.

Do identical transactions have the same transaction IDs?

  1. If yes, why do the validators send it again only when they have seen it already? How does the validators takes turns sending the transaction to the consensus according to the transaction ID?
  2. If no, the transaction should be sent only if they haven’t seen it before or it would never be sent. And, how to remove duplicate according to the transaction ID?

I’m getting confused about how it actually work to order the transactions or deduplicate.

My apologies, I made a typo in the text you quote above, and I think it confuses matters. It should read:

takes turns sending the transaction to the consensus according to the transaction ID, only if they have not seen it already.

So validators take turns trying to send a transaction until they see it appear in the consensus, then they stop. Most of the time this results in the first validator for this transaction sending it, and others just receiving it - and no one else sending it.

The same transaction has the same transaction ID indeed. It is derived by hashing it (with some care around signature malleability).

2 Likes