How to implement distibuted DB on Hyperledger Fabric (GDPR) - hyperledger-fabric

We are building a solution and we are modeling a network using Fabric and Composer
Regarding "not" storing any personal data (GDPR complience) on the blockchain, we would like to hash/map the personal data so that a GUID och Hash is stored in the Ledger instead (Anonymized data)
Does Hyperledger provide any solution to solve this kind of issues (ie a distributed DB that is around the ledger peers for example?)
Or is this something that is needed to be implemented outside the Hypeledger network topology?

Prior to Fabric v1.1, you would need to provide the database yourself and then just write the hashes to the blockchain as normal transactions. There are people who do this today for database records as well as for documents (store the document outside and just write the hash and metadata to the blockchain).
In Fabric v1.1, there is an experimental featured known as "private data". With this feature, the actual state is kept local to the peers in a private state database and is not included in the actual blockchain itself. The ledger actually contains hashes of the key and value.
There are new chaincode APIs (Get/PutPrivateData) which are used to do this automatically for you. You can then either delete the data manually or use the DeletePrivateState function in chaincode to delete the actual records (the hash will stay on the channel ledger).
This feature is experimental in v1.1 so you will need to build the peer from source with -tags experimental.
Since this feature is experimental, it is not currently supported in Composer.
We will be hardening the feature as part of the 1.2 release which is under development

Related

Why blockchain structure is used in Hyperledger Fabric?

I have a problem with understanding why Hyperledger Fabric (HLF) uses blockchain structure.
I know that in Bitcoin blockchain structure ensures big security due to PoW algorithm and longest chain rule, but what are advantages of using a blockchain structure in HLF?
It seems to me that in Hyperledger Fabric, instead of the blockchain structure, there could be one transaction history log and network could work in the same way - I bet I'm wrong, but I haven't been able to find an explanation yet.
I would be grateful for the clarification of this issue.
I think a lot of questions you have in your mind comes from the overlapped definitions of DLT and blockchain.
DLT:
A DLT is simply a decentralized database that is managed by various participants. There is no central authority that acts as an arbitrator or monitor. As a distributed log of records, there is greater transparency – making fraud and manipulation more difficult – and it is more complicated to hack the system.
All of this could well be familiar because it’s written about the features of blockchain as well.
Blockchain:
Blockchain is nothing else but a DLT with a specific set of features. It is also a shared database – a log of records – but in this case shared by means of blocks that, as the name indicates, form a chain. The blocks are closed by a type of cryptographic signature called a ‘hash’; the next block begins with that same ‘hash’, a kind of wax seal. That is how it is verified that the encrypted information has not been manipulated and that it can’t be manipulated.
DLT platforms that are not blockchain provide immutability too, but it's just the way Hyperledger Fabric provides this characteristic which makes it a blockchain framework.
Every blockchain framework, be it the Ethereum, Bitcoin, etc all store the transaction information in blocks, where each block is linked to its predecessor by a hash and provides immutability.
Corda is very much similar to Hyperledger Fabric, but it is said to be both a blockchain and not a blockchain. Architecturally, it's very much similar to Hyperleder Fabric, but with only a key difference which makes Hyperledger Fabric a blockchain framework, and Corda a DLT.
I'll try to answer your question by emphasizing on the point that why Corda is not a blockchain.
Why is Corda a blockchain, and not a blockchain?
A Transaction in Corda is cryptographically linked (chained) to the transactions it depends on. Just like Bitcoin, but the range of concepts that can be expressed is far wider.
Transactions in Corda are processed by having each participant in the transaction execute the same code deterministically to verify the proposed updates to the ledger. Just like Ethereum, but the languages you can use are high-level and productive, like Java, rather than obscure ones like Solidity.
Transactions in Corda are shared only with those who have a need to know. Just like channels in Fabric but designed in from day one and fully integrated into the programming model.
Transactions in Corda are confirmed through a process of consensus forming using one of a range of algorithms, including Byzantine Fault Tolerant algorithms. Just like any other blockchain, but with the unique features that a Corda network can support multiple different consensus pools using different algorithms.
So, for all intents and purposes, Corda is a blockchain. And yet… there’s also an utterly critical difference.
Unlike the platforms mentioned above, Corda does not periodically batch up transactions needing confirmation — into a block — and confirm them in one go. Instead, Corda confirms each transaction in real-time. No need to wait for a bunch of other transactions to come along. No need to wait for a “block interval”. Each transaction is confirmed as we go.
Now coming onto your question why Hyperledger Fabric (HLF) uses blockchain structure? The answer is simply because they chose to.
References:
https://www.bbva.com/en/difference-dlt-blockchain/
https://cointelegraph.com/news/what-is-the-difference-between-blockchain-and-dlt
https://www.corda.net/blog/corda-top-ten-facts-7-both-a-blockchain-and-not-a-blockchain/
To keep the record immutable, Hyperledger Fabric uses blockchain structure. So by using Hyperledger Fabric, you can get an immutable record of all the transactions which is tough to temper with fraudulent activities.
Suppose you buy an valuable asset and you are the current owner of that asset. Now it is very hard for others to temper that records or change your ownership without your permission as Hyperledger Fabric uses blockchain structure to keep the record immutable.

Hyperledger Fabric private data collection to distribute large files

We are currently researching on Hyperledger Fabric and from the document we know that a private data collection can be set up among some subset of organizations. There would be a private state DB (aka. side DB) on each of these organizations and per my understanding, the side DB is just like a normal state DB which normally adopts CouchDB.
One of our main requirements is that we have to distribute files (e.g. PDFs) among some subset of the peers. Each file has to be disseminated and stored at the related peers, so a centralized storage like AWS S3 or other cloud storage / server storage is not acceptable. As the file maybe large, the physical copies must be stored and disseminate off-chain. The transaction block may only store the hash of these documents.
My idea is that we may make use of the private data collection and the side DB. The physical files can be stored in the side DB (maybe in the form of base64string?) and can be distributed via Gossip Protocol (a P2P protocol) which is a feature in Hyperledger Fabric. The hash of the document along with other transaction details can be stored in a block as usual. As they are all native features by Hyperledger Fabric, I expect the transfer of the files via Gossip Protocol and the creation of the corresponding block will be in-sync.
My question is:
Is this way feasible to achieve the requirement? (Distribution of the files to different peers while creating a new block) I kinda feel like it is hacky.
Is this a good way / practice to achieve what we want? I have been doing research but I cannot find any implementation similar to this.
Most of the tutorial I found online pre-assumes that the files can be stored in a single centralized storage like cloud or some sort of servers, while our requirement demands a distribution of the files as well. Is my idea described above acceptable and feasible? We are very new to Blockchain and any advice is appreciated!
Is this way feasible to achieve the requirement? (Distribution of the files to different peers while creating a new block) I kinda feel like it is hacky.
So the workflow of private data distribution is that the orderer bundles the private data transaction containing only a hash to verify the data to a new block. So you dont have to do a workaround for this since private data provides this per default. The data itself gets distributed between authorized peers via gossip data dissemination protocol.
Is this a good way / practice to achieve what we want? I have been doing research but I cannot find any implementation similar to this.
Yes and no. Sry to say so. But this depends on your file sizes and amount. Fabric is capable of providing rly high throughput. I would test things out and see if it meets my requirements.
The other approach would be to do a work around and use IPFS (a p2p file system). You can read more about that approach here here
And here is an article discussing storing 'larger files' on chain. Maybe this gives some constructive insights aswell. But keep in mind this is an older article.
Check out IBM Blockchain Document Store, it is the implementation of storing any document (pdf or otherwise) both on and off chain. It has been done.
And while the implementation isn't publicly available, there is vast documentation on it's usage, can probably disseminate some information from it

How to retrieve the database in hyperledger fabric?

I have one doubt on hyperledger fabric where the ledger database will save please let me know. How to restore the ledger data when we lost the device.
Thanks in advance.
CouchDB or LevelDB is used as a state store, which stores the latest data only and does not have the entire ledger data. So, although you could retrieve the latest data from them, I'm afraid you can't use them for recovery of the entire ledger, including history.
As far as I can see, the best way to restore the data would be to abandon the peer, create a new one and get synchronized from other peers.
To accomplish that, you must have two peers in advance; once a peer is down, create a new one and participate it to the network. That way, the new peer will receive the data from the sane peer.

What suits best for Hyperledger fabric while Integration?

I'm currently trying to figure out on couple of things
1) What blockchain databases can be integrated with hyperledger fabric – (such as IPDB, or Bigchain db or couchDb)
2) What distributed file systems can be integrated with hyperledger fabric (such as IPFS, StorJ, Swarm).
can someone add your views, if there is anything better please share.
There are no such limitations, many people use integration patterns with various other technologies, for example:
Clients can write data to a distributed file system or database and then put a link and/or hash of the data on the Fabric blockchain as permanent evidence of the data.
Clients can listen for block events on a Fabric peer, and as blocks are committed to Fabric, write the data to another distributed file system or database.
The possibilities are endless.

hyperledger fabric v0.6 modify Data in RockDB or Database

Since the concept of blockchain is not support modification. All data that being write to Ledger will not support changing. I want to test on changing data value that store in Ledger. I try to find ways around on how to change data, but I couldn't an exact one. I know that Hyperledger Fabric v0.6, Data is stored in RockDB.
Really Appreciate if someone could help to figure out that part. Because I also want to know that it really support un-modification.
Also Auditor will involve in checking Data changing. I also still cannot get clear answer on What's Auditor?
How to configure Auditor in Fabric v0.6?
If you have access to a peer or hack into a peer, you can tamper with the data. The power of blockchain is not that the data on a single peer is unmodifiable, it's that modification can be detected since the hash chain and signatures would not be correct if the data was tampered. This peer would not be able to change other peers, or convince other peers of the accuracy of the modified data. The integrity of the overall blockchain would remain.

Resources