Couch DB can be used as a state database for Hyperledger fabric. As per documentation and from my experience, I understand that couchDB contains only the World state data and not the blockchain data. Also we could use things like design documents, views, pagination, indexes etc. Also couchdb needs to run in a separate container and we map the couchDB container port to the peer.
My question is that if we have terabytes of data in the world state, how does this affect scaling up in couch DB? Can a single container handle this data without taking a performance hit? It is written in the couch DB documentation that when the system grows large enough, a sensible thing is to add more nodes to the database. How do we achieve this in hyperledger fabric?
I think you might get your answer in this discussion.
Related
I'm currently looking to securely replicate hundreds of Gbs of data across a few hundred hosts. I was looking at hyperledger-fabric private blockchain because of its use of TLS and peer to peer gossip protocol for data transmission, plus of course the security of the blockchain itself.
Is it reasonable for me to be considering using blockchain as a way to securely do data replication? I have not seen this in any blockchain use case, but from what I've read it seems reasonable even though everything I've read seems to indicate storing data in the blockchain is a bad idea. Usually the arguments are that it costs too much and the data has to be replicated across all the peers in the system. Cost isn't a concern in this case because its a private blockchain and for my use case the data replication (if it can be done efficiently) is what I'm looking for.
I could use ipfs, swift, S3, etc. to store the data, but that would add operational burden, especially if hyperledger-fabric can do the job on its own.
Also, if I use hyperledger private data collections, how much control over purging do I have? For my use cases, I can't just purge the oldest data as in some cases older data needs to be preserved for a long time and in some cases newer data can be purged fairly quickly.
On the subject of data replication:
TL;DR; Not a blockchain solution
Here's my thinking behind that.
Storing large amounts of data isn't a good idea as you've mentioned. Yes there's the replication side of the data across. (but that's a side-effect needed in this case). But also there's the signing and validation etc that nees to take place across all that data. So the costs in terms of processing would mean it would inefficient.
Definition of securely.. You don't say what quality of service would constitute 'secure'. For example
Access Control for users to access the data?
Assurance that the data has been replicated and is on disk at remote locations without corruption?
Encryption of data to protect it in transit and at rest.
Blockchain, and I'm thinking Hyperledger Fabric here, would offer you the assurance. But there's no encryption in transit, you'd need to add that. And access control, the primitives are there but required you to implement and use them.
I would tend to think of the use of Blockchain in this scenario would be to provide the audit trail of how the data was replicated between hosts, with some other protocol.
On the subject of private data collection purging:
Currently this is implemented by purging data when the peer reaches a certain block height. i.e. purge after 42 blocks. But we're working on a feature to allow 'purge-on-demand' based on a call from the chaincode.
We are currently researching on Hyperledger Fabric and from the document we know that a private data collection can be set up among some subset of organizations. There would be a private state DB (aka. side DB) on each of these organizations and per my understanding, the side DB is just like a normal state DB which normally adopts CouchDB.
One of our main requirements is that we have to distribute files (e.g. PDFs) among some subset of the peers. Each file has to be disseminated and stored at the related peers, so a centralized storage like AWS S3 or other cloud storage / server storage is not acceptable. As the file maybe large, the physical copies must be stored and disseminate off-chain. The transaction block may only store the hash of these documents.
My idea is that we may make use of the private data collection and the side DB. The physical files can be stored in the side DB (maybe in the form of base64string?) and can be distributed via Gossip Protocol (a P2P protocol) which is a feature in Hyperledger Fabric. The hash of the document along with other transaction details can be stored in a block as usual. As they are all native features by Hyperledger Fabric, I expect the transfer of the files via Gossip Protocol and the creation of the corresponding block will be in-sync.
My question is:
Is this way feasible to achieve the requirement? (Distribution of the files to different peers while creating a new block) I kinda feel like it is hacky.
Is this a good way / practice to achieve what we want? I have been doing research but I cannot find any implementation similar to this.
Most of the tutorial I found online pre-assumes that the files can be stored in a single centralized storage like cloud or some sort of servers, while our requirement demands a distribution of the files as well. Is my idea described above acceptable and feasible? We are very new to Blockchain and any advice is appreciated!
Is this way feasible to achieve the requirement? (Distribution of the files to different peers while creating a new block) I kinda feel like it is hacky.
So the workflow of private data distribution is that the orderer bundles the private data transaction containing only a hash to verify the data to a new block. So you dont have to do a workaround for this since private data provides this per default. The data itself gets distributed between authorized peers via gossip data dissemination protocol.
Is this a good way / practice to achieve what we want? I have been doing research but I cannot find any implementation similar to this.
Yes and no. Sry to say so. But this depends on your file sizes and amount. Fabric is capable of providing rly high throughput. I would test things out and see if it meets my requirements.
The other approach would be to do a work around and use IPFS (a p2p file system). You can read more about that approach here here
And here is an article discussing storing 'larger files' on chain. Maybe this gives some constructive insights aswell. But keep in mind this is an older article.
Check out IBM Blockchain Document Store, it is the implementation of storing any document (pdf or otherwise) both on and off chain. It has been done.
And while the implementation isn't publicly available, there is vast documentation on it's usage, can probably disseminate some information from it
The title practically says everything.
In a near future I'm going to implement a real-time tracking system using possibly, a blockchain, and for certain reasons Hyperledger Fabric seems to be the chosen technology. After the information is recorded, it should be accessible in a map in a web application.
So the question is: If we save every one or two minutes the location of a truck to the blockchain using gps, it will hurt the general performance of the blockchain in a near future? (milions and milions of registers)
In the end I have to decide if I should save this information in the blockchain or, knowing that it would cause some serious issues, leave that information out of it and use an hybrid system with a classic database for that and a blockchain for other functionalities that won't cause performance issues.
Thanks.
There is no storage limitation on the Fabric Ledger other than the disk space. The current value of a key (say, the latest position of the truck)can be read via Fabric query which are saved in the world state for quick retrieval. There is also mechanism to look up the history of a key quickly via the historyDB that Fabric maintains.
Im just wondering what are the limitations of Hyperledger Fabric in terms of how much data can be stored on each of the peers?
Following this question I'm wondering what are you options in managing large amounts of data on a Hyperledger network i.e. with decentralised networks etc.
I'm struggling to find good resources on this so it would be great if somebody could fill me in or point me to some good resources on the topic!
Your question is too broad. However, I will cut down a few things and try to answer this as concisely as possible.
The amount of data that can be stored is although a direct function of the peer's capacity, however storing large amounts of data on the ledger is itself not recommended in case of any ledger, because scaling is an issue. You can opt for off chain storage like IPFS to store bulky data off-chain and have the proof of the same as a hash on the ledger. Next, you can try segregating data into channels and have peers join the channel on a need basis.
Moreover, have your data properly indexed (in CouchDB) and consider for replication and scalability aspect of the couch DB itself.
I hope I was able to touch the iceberg. For more details, you can reach me offline for further discussions.
I'm currently trying to figure out on couple of things
1) What blockchain databases can be integrated with hyperledger fabric – (such as IPDB, or Bigchain db or couchDb)
2) What distributed file systems can be integrated with hyperledger fabric (such as IPFS, StorJ, Swarm).
can someone add your views, if there is anything better please share.
There are no such limitations, many people use integration patterns with various other technologies, for example:
Clients can write data to a distributed file system or database and then put a link and/or hash of the data on the Fabric blockchain as permanent evidence of the data.
Clients can listen for block events on a Fabric peer, and as blocks are committed to Fabric, write the data to another distributed file system or database.
The possibilities are endless.