Hyperledger Fabric - Detection of Modification to StateDB in Peer - hyperledger-fabric

This is a question about StateDB automatic fix after tampering.
We're wondering if manipulation to StateDB (CouchDB) could be detected and fixed automatically by peer.
The following documents states there is a a state reconciliation process synchronizing world state in Peer.
https://hyperledger-fabric.readthedocs.io/ja/latest/gossip.html#gossip-messaging
In addition to the automatic forwarding of received messages, a state
reconciliation process synchronizes world state across peers on each
channel. Each peer continually pulls blocks from other peers on the
channel, in order to repair its own state if discrepancies are
identified.
But when we test it as follows:
Step 4: modify value of key a in StateDB
Step 10: Wait for 15 minutes
Step 11: Create a new block
Step 12: Check value of a through chaincode and confirm it in StateDB directly
The tampered value is not fixed automatically by peer.
Can you help clarify the meaning of "state reconciliation process" in the above document and if peer would fix the tampering to StateDB.
Thank you.

Gossip protocol is to sync up legit data among peers, not tampered data in my view. What is legit data? Data whose computed hash at anypoint of time matches with the orignally computed hash, which will not be the case for the tampered data, and so I'd not expect Gossip protocol to sync such 'dirty data'. This defeats the purpose of Blockchain altogether as a technology, and hence this is a wrong test to be performed in my view.
Now, then what is Gossip protocol? Refer https://hyperledger-fabric.readthedocs.io/ja/latest/gossip.html#gossip-protocol
"Peers affected by delays, network partitions, or other causes
resulting in missed blocks will eventually be synced up to the current
ledger state by contacting peers in possession of these missing
blocks."
So, in cases where the peer should have comitted a block to ledger and would have missed due to some reasons like the ones said above, 'Gossip' is only a fall back strategy for the HLF to reconcile the ledger among the peers.
Now in your test case, I see you are using 'query', now query does not go via the orderer to all the peers, it just goes to one peer and returns the value, you need to do a 'getStringState' as a 'transaction', for the 'enderosement' to run, and that is when would the endorsement fail citing the mismatch between the values for the same key among the peers is what I'd expect.

# Gossip state transfer related configuration
state:
# indicates whenever state transfer is enabled or not
# default value is true, i.e. state transfer is active
# and takes care to sync up missing blocks allowing
# lagging peer to catch up to speed with rest network
enabled: false
.......................................................................................................
# the process of reconciliation is done in an endless loop, while in each iteration reconciler tries to
# pull from the other peers the most recent missing blocks with a maximum batch size limitation.
# reconcileBatchSize determines the maximum batch size of missing private data that will be reconciled in a
# single iteration.
reconcileBatchSize: 10
# reconcileSleepInterval determines the time reconciler sleeps from end of an iteration until the beginning
# of the next reconciliation iteration.
reconcileSleepInterval: 1m
# reconciliationEnabled is a flag that indicates whether private data reconciliation is enable or not.
reconciliationEnabled: true
Link: https://github.com/hyperledger/fabric/blob/master/sampleconfig/core.yaml
In addition to the automatic forwarding of received messages, a state reconciliation process synchronizes world state across peers on each channel. Each peer continually pulls blocks from other peers on the channel, in order to repair its own state if discrepancies are identified. Because fixed connectivity is not required to maintain gossip-based data dissemination, the process reliably provides data consistency and integrity to the shared ledger, including tolerance for node crashes.

Related

What factors influence the ordering of transactions in a block in HLF 1.4.4?

In the documentation for the transaction flow of hyperledger fabric it is mentioned that
"The ordering service does not need to inspect the entire content of a transaction in order to perform its operation, it simply receives transactions from all channels in the network, orders them chronologically by channel, and creates blocks of transactions per channel."
I have a couple of questions here
What does "chronological ordering" mean?. Does it mean that the transactions for a channel are ordered depending on the time they are received at the Ordering service node (Leader) ?
If two client applications are submitting an update transaction for the same key on the ledger almost at the same time [Let us call them tx1 (updating key x to value p) , tx2 (updating key x to value q)], all the endorsing peers will simulate the update transaction proposal and return the write set in the transaction proposal response. When the clients send these endorsement proposal requests to ordering service nodes , in which order will these update transactions be ordered in a block ?.
The order of transactions in the block can be tx1,tx2 OR tx2,tx1 , assuming the update transaction has only the write set and no read set , both the transactions in either orders are valid. What will be the final value of the key on the ledger [p or q] ?
I am trying to understand if the final value of x is deterministic , and what factors would influence it.
What does "chronological ordering" mean?. Does it mean that the transactions for a channel are ordered depending on the time they are received at the Ordering service node (Leader)?
In general, the orderer makes no guarantees about the order in which messages will be delivered just that messages will be delivered in the same order to all peer nodes.
In practice, the following generally holds true for the current orderer implementations:
Solo - messages should be delivered in the order in which they were received
Kafka - messages should be delivered in the order in which they were received by each orderer node and generally even in the order they are received across multiple ordering nodes.
This holds true for the latest fabric version as well.
If two client applications are submitting an update transaction for the same key on the ledger almost at the same time [Let us call them tx1 (updating key x to value p) , tx2 (updating key x to value q)], all the endorsing peers will simulate the update transaction proposal and return the write set in the transaction proposal response. When the clients send these endorsement proposal requests to ordering service nodes, in which order will these update transactions be ordered in a block ?.
When you submit a transaction, the peer generates a read and write set. This read/write set is then used when the transaction is committed to the ledger. It contains the name of the variables to be read/written and their version when they were read. If, during the time between set creation and committing, a different transaction was committed and changed the version of the variable, the original transaction will be rejected during committal because the version when read is not the current version.
To address this, you will have to create data and transaction structures that avoid concurrently editing the same key, other you might get MVCC_READ_CONFLICT error.

What is the valuation of Snapshot process in Raft Consensus of Fabric?

I'm digging into the new consensus of Hyperledger Fabric 1.4. The concept of Raft and the way it works is quite clear in the document.
But there is a concept called snapshot, which is
While it’s possible to keep all logs indefinitely, in order to save disk space, Raft uses a process called “snapshotting”
I've read through the documents about Raft too (Ex: this one), and understand the reason Raft need snapshot where it let the leaders and followers store less data (just snapshot and some new logs). But in the case of Fabric, the example about snapshot describe the way a lagging orderer get the missed blocks
For example, let’s say lagging replica R1 was just reconnected to the network. Its latest block is 100. Leader L is at block 196, and is configured to snapshot at amount of data that in this case represents 20 blocks. R1 would therefore receive block 180 from L and then make a Deliver request for blocks 101 to 180. Blocks 180 to 196 would then be replicated to R1 through the normal Raft protocol
It confuses to me while the orderers store the full ledger, so why it still needs snapshot and how can it save disk space here?

Where is the 'real' ledger and How is it maintained?

In a Hyperledger Fabric network, ledgers which all peers(endorsing peers and committing peers) have are replicated ledgers.
It seems to imply there is a unique 'real/original/genuine' ledger per channel.
I'd like to ask these:
Is there a real ledger? If so, where is it(or where is it defined?) and who owns it?
Those replicated ledgers are updated by each peer, after VSCC, MVCC validation. Then who updates the 'real' ledger?
Does 'World State' only refers to the 'real' ledger?
I'd really appreciate if you answer my questions.
Please tell me if these questions are clarified to you. Thank you!
I don't understand what exactly you mean by 'real' ledger. There is one & only ledger per channel, replicated across all participants per channel. When I say participants, I mean all peers (both endorsing & committing) of an organization's MSP belonging to a given channel.
State DB (a.k.a World state) refers to the database that maintains the current value of a given key. Let me give you an example. You know that a blockchain is a liked list on steroids (with added security, immutability, etc). Say, you have a key A with value 100 in Block 1. You transact in the following manner.
Block 2 -- A := A-10
Block 15 -- A := A-12
.
.
.
Block 10,000 -- A := A-3
So, after Block 10,000, if you need the current value of key A, you have to calculate the value from Block 1. So to manage this efficiently, Fabric folks implemented a state database that updates the value of a key in the state after every transaction. It's sole responsibility is to improve efficiency. If your state gets corrupted, Fabric will automatically rebuild it from Block 0.

Can 2 transactions of the same block update the same state key?

I believe the answer is no but I'd like confirmation.
With Fabric, endorsers simulate the transaction upon latest state and prepare the proposal adding the read and write set of keys.
At the commit phase, the peer will receive a block from the ordering service and the write set update is only applied if the read set has not been updated (versioning check).
So for the same block, the same key cannot be updated by 2 different transactions of the same block.
If it is the case, aggregating value and maintaining balance on-chain might be problematic for frequent transactions use-case. Such operation should be left for off-chain application layer.
So for the same block, the same key can not be updated by 2 different transactions of the same block.
The above is correct. Hyperledger Fabric uses an MVCC-like model in order to prevent collisions (or "double spend"). You'll want to wait for the previous state change transaction to commit before attempting to update the state again.

Circular Distributed Hash Table overlay P2P network

I think I'm missing something here or confusing terms perhaps.
What happens to the key:value pairs stored at a peer in the overlay DHT when that peer leaves the p2p network? Are they moved to the new appropriate nearest successor? Is there a standard mechanism for this if that is the case.
My understand is that the successors and predecessor peer information of adjacent peers has to be modified as expected when a peer leaves however I can't seem to find information on what happens to the actual data stored at that peer. How is the data kept complete in the DHT as peer churn occurs?
Thank you.
This usually is not part of the abstract routing algorithm that's at the core of a DHT but implementation-specific behavior instead.
Usually you will want to store the data on multiple nodes neighboring the target key that way you'll get some redundancy to handle the failures.
To keep it alive you can either have the originating node republish it in regular intervals or have the storage nodes replicate it among each other. The latter causes a bit less traffic if done properly, but is more complicated to implement.

Resources