I read
https://hyperledger-fabric.readthedocs.io/en/release-2.0/private-data/private-data.html says that
"A hash of that data, which is endorsed, ordered, and written to the ledgers of every peer on the channel. The hash serves as evidence of the transaction and is used for state validation and can be used for audit purposes." .
However, I think signatures of the transaction is enough for evidence that contract was agreed upon.
Why hash of the data should be shared among the every peers?
Private transactions are not stored in blocks in the chain like public transactions. All the peers joined to the channel share the same channel chain. Thus, if the private transaction were stored normally in the chain, every peer (even those from organizations to which the transaction is not destined) could read the private transaction parameters (and reconstruct the others' private state). To avoid this, basically, a hash is stored in the block in its place so that the organizations which share the private data can still check integrity.
EDIT:
Let's see, if you read carefully https://hyperledger-fabric.readthedocs.io/en/release-2.0/private-data/private-data.html#transaction-flow-with-private-data, you'll see that at no time is the private data signed (neither the readset, nor the writeset, nor the input, nor the output). In step 3 a signature is produced only over its hash (embedded in the transaction). Nothing else. The private data (the data, not the transaction) is simply distributed via gossip protocol and stored temporally in the transient data store (point 2), to be committed in point 5. The only evidence on the private data is the hash (which is embedded in the transaction and signed). The gossip protocol has its security mechanisms, but it does not produce evidence or guarantee the transaction order.
Now, take the case where, later, a malicious organization unilaterally alters its private state in order to obtain some kind of benefit. That hash in the chain would be the only evidence left to resolve the dispute with other organizations. There is no other evidence on the agreed valid value (and the execution order) than that hash. No plain text private data was signed by the peer.
That's the way it is at Fabric. And it makes sense. Keep in mind that it is necessary to guarantee the integrity of the data and the order in which the transactions are executed, so at some point the orderer (point 4) has to be involved to determine the order (in most cases the order of the transactions does alter the product) without disclosing the real data.
What I feel is that, when only signature is shared, how does other peers validate what the signature is for? Or vice versa, how to validate whether it is actually being signed by the claimed owner? So, we share hash of the transaction + signature of that hash. This signature now can be verified using public certificate of that owner. Also, as the transaction is hashed, it remains private but the ledger (blocks chained to one another) is identical among all peers.
Related
Since chaincode has to be deterministic, is there any way to get same random number in all endorser nodes to implement something like a lottery? In Ethereum you can do something like that:
function random() private view returns (uint) {
return uint(keccak256(abi.encodePacked(block.difficulty, now, players)));
}
By using the blocknumber, timestampt, block difficulty, gas , etc...But you dont have that in Hyperledger Fabric.
You're quite correct that the deterministic nature of chaincode does cause issues with random numbers.
I'd look into doing this in one of two ways.
You need to pass in the random number (or at least a seed) as part of the transaction request. You might want to send this in transient data so that's not recorded on the ledger.
Pre-store the random numbers. i.e. generate a massive table of random numbers and put those into the ledger in some form of setup transaction. Maybe even into the private data collections. Work along those each time you need a number using a counter.
You can protect access to the set of generated numbers by various type of access control.
We can use the timestamp passed by proposer of transaction as the seed.
Usage: stub.GetTxTimestamp()
// GetTxTimestamp returns the timestamp when the transaction was created. This
// is taken from the transaction ChannelHeader, therefore it will indicate the
// client's timestamp and will have the same value across all endorsers.
GetTxTimestamp() (*timestamp.Timestamp, error)
Chaincode that executes range or rich JSON queries and updates data in
a single transaction is not supported, as the query results cannot be
validated on the peers that don’t have access to the private data, or
on peers that are missing the private data that they have access to.
If a chaincode invocation both queries and updates private data, the
proposal request will return an error. If your application can
tolerate result set changes between chaincode execution and
validation/commit time, then you could call one chaincode function to
perform the query, and then call a second chaincode function to make
the updates. Note that calls to GetPrivateData() to retrieve
individual keys can be made in the same transaction as
PutPrivateData() calls, since all peers can validate key reads based
on the hashed key version.
Find Link Here
I came across this small paragraph related to limitation of querying Private Data in fabric. I am pretty much new to private data concept.
What I understood is as follow,
CC invocation requiring
Range or rich JSON queries AND update both, to private data causes proposal to return error.
It's better to, call one chaincode function to perform the query, and then call a second chaincode function to make the updates
Normally, GetPrivateData() to retrieve individual keys can be made in the same transaction as PutPrivateData() calls, since all peers can validate key reads based on the hashed key version.
Is my understanding correct ?
If yes then why is it so for private data?
If no then please give me wisdom.
So, first of all - rich queries are never re-executed upon commit anyway, be it private data or not private data.
Now, as for range queries - remember that range queries are all about the assumption that keys have an alphabetical order between them.
However, a transaction needs to pass MVCC checks regardless of the peer has the private data or not, but if the peer doesn't have the private data then it only sees hashes, (not the real key names) and hashes are not sorted alphabetically - hence it cannot validate that the range query simulation is not stale.
Hyperledger fabric provides inbuilt support storing offchain data with the help of private collections. For this we need to specify the collection config which contains various collection names along with the participants that has access to data present in those collections.
There is a setting called "BlockToLive" using which we can specify for how many blocks the peers should store the private data they have access to. The peers will automatically purge the private data after the ledger block height reaches to the mentioned threshold.
We have a requirement in which we need to use the private data collections but the data should be removed (automatically/manually) after exactly 30 days. Is there any possibility to achieve the same?
timeToLive: Is there any implementation for specifying the timeToLive or similar configuration? Using this the peers will automatically purge the data after mentioned duration.
If there is no automatic way possible currently, how can the data present in private collection be removed manually? Is there any way by which the data in private collections be removed directly using external scripts/code? We don't want to to create chaincode methods that will be used to invoke as transactions to delete private data as even the deletion of private data will need to be endorsed and sent to the orderer and needs to be added to the ledger. How can the private data be removed directly?
First, everything that you put on the blockchain is permanent and supposed to be decentralized. So, having unilateral control over when to delete the private data goes against the virtue of decentralization and you should avoid it (answer to point 2). Endorsers endorse every change or transactions. (including the BlockToLive), so it does not make sense to deviate from the agreed period.
Second, time in distributed systems is subjective and it impossible to have a global clock ⏰ (say 30 days for one node can be 29.99 for another or 29.80 days for another). Hence, time is measured in blocks which is objective for all nodes. So, it is recommended that you use BlockToLive. It can be difficult first, but you can calculate backwards.
Say you have BlockSize as 10 (no. of transactions in a block) and expect around 100 transactions per day, then you can set BlockToLive = 300. (Of course, this is a ballpark number).
Finally, if you still want to delete private data at will, I would recommend manual off-chain storage mechanisms.
Just a general question, if I'm building a blockchain for a business I want to store 3 years of transactions but anything older than that I won't need and don't want actively in the working database. Is there a way to backup and purge a blockchain or delete items older than some time frame? I'm more interested in the event logic than the forever memory aspect.
I'm not aware of any blockchain technology capable of this yet, but Hyperledger Fabric in particular is planning to support data archiving (checkpointing). Simply put, participants need to agree on a block height, so that older blocks can be discarded. This new block then becomes the source of trust, similar to original genesis block. Also, snapshot needs to be taken and consented, which captures current state.
From serviceability point of view, it's slightly more complicated, i.e. you may have nodes that are down while snapshotting, etc.
If you just want to purge the data after a while, Fabric Private Data has an option which could satisfy your desire.
blockToLive Represents how long the data should live on the private database in
terms of blocks. The data will live for this specified number of
blocks on the private database and after that it will get purged,
making this data obsolete from the network so that it cannot be
queried from chaincode, and cannot be made available to requesting
peers
You can read more here.
Personally, I don't think there is a way to remove a block from the chain. It might destroy the Immutable property of blockchain.
There are 2 concepts which help you achieve your goals.
The one thing is already mentioned. It is about Private Data. Private data gives you the possibility to 'label' data with a time to live. Then only the private data hashes are stored on the chain (to be able to verify this transaction) but the data itself is stored in so called SideDBs and gets fully pruned (except the hashes on the chain of course). This is kind of the basis for using Fabric without workarounds and achieving GDPR.
The other thing, which was not mentioned yet and kind of is very helpful to this question
Is there a way to backup and purge a blockchain or delete items older than some time frame?
Every peer only stores the 'current state' of the ledger in his StateDB. The current state could be described as the data which is labeled 'active' and probably soon to be used again. You can think of the StateDB being like a Cache. Every Data is comes into this Cache by creating or updating a new key (invoking). To remove a key from the Cache you can use 'DelState'. So it is labeled 'deleted' and not in the Cache anymore. BUT it is still on the ledger! and you can retrieve the history and data to that key.
Conclusion: For 'real' deleting of data you have to use the concept of Private Data and for managing data in your StateDB (think of the 'Cache' analogy) you can simply use built in functions.
Given that Hyperledger Fabric's chaincode needs to be deterministic because it's being executed on all validating peers (Are blocks mined in HyperLedger Fabric?), how would one get a unique ID so I can 'InsertRow' with a unique value.
For example, if I execute my code to append a new record to the table, I'd need a unique key. If I get a GUID on Validating Peer 1 (vp1), it would be a different key if I got a GUID on Validating Peer 2 (vp2). The same if I used milliseconds as a key.
Is there a way I can get a deterministic unique ID in chaincode from within the chaincode rather than passing it in from the 'client'?
I would be inclined to implement this as a monotonically increasing sequence variable stored in putstate along side your table. IOW, initialize something like PutState("nextsequence", 0) in your Init() function, and then RMW that any time you need a new id. The RMW mutation will be implicitly coupled to your row insert, and should be deterministic across all instances.
If the ID should be generated by the chaincode, then a monotonically increasing counter it a good solution. If the ID should be chosen by the tx generating app, you should enforce the rule the ID's can't reused and encourage the protocol the tx generator uses to select collision resistant ids.