I'm currently looking to securely replicate hundreds of Gbs of data across a few hundred hosts. I was looking at hyperledger-fabric private blockchain because of its use of TLS and peer to peer gossip protocol for data transmission, plus of course the security of the blockchain itself.
Is it reasonable for me to be considering using blockchain as a way to securely do data replication? I have not seen this in any blockchain use case, but from what I've read it seems reasonable even though everything I've read seems to indicate storing data in the blockchain is a bad idea. Usually the arguments are that it costs too much and the data has to be replicated across all the peers in the system. Cost isn't a concern in this case because its a private blockchain and for my use case the data replication (if it can be done efficiently) is what I'm looking for.
I could use ipfs, swift, S3, etc. to store the data, but that would add operational burden, especially if hyperledger-fabric can do the job on its own.
Also, if I use hyperledger private data collections, how much control over purging do I have? For my use cases, I can't just purge the oldest data as in some cases older data needs to be preserved for a long time and in some cases newer data can be purged fairly quickly.
On the subject of data replication:
TL;DR; Not a blockchain solution
Here's my thinking behind that.
Storing large amounts of data isn't a good idea as you've mentioned. Yes there's the replication side of the data across. (but that's a side-effect needed in this case). But also there's the signing and validation etc that nees to take place across all that data. So the costs in terms of processing would mean it would inefficient.
Definition of securely.. You don't say what quality of service would constitute 'secure'. For example
Access Control for users to access the data?
Assurance that the data has been replicated and is on disk at remote locations without corruption?
Encryption of data to protect it in transit and at rest.
Blockchain, and I'm thinking Hyperledger Fabric here, would offer you the assurance. But there's no encryption in transit, you'd need to add that. And access control, the primitives are there but required you to implement and use them.
I would tend to think of the use of Blockchain in this scenario would be to provide the audit trail of how the data was replicated between hosts, with some other protocol.
On the subject of private data collection purging:
Currently this is implemented by purging data when the peer reaches a certain block height. i.e. purge after 42 blocks. But we're working on a feature to allow 'purge-on-demand' based on a call from the chaincode.
Related
I'm using hyperledger fabric and need to store images but don't know what is the best way to store is it on or off the blockchain? both have their pros and cons. should sacrifice the security of the images or the performance of the network?
Blockchains are not suitable for storing big size data. There are drawbacks such as the replication rate and limitations such as the block size.
So I recommend using an off-chain storage system. A common approach is using IPFS, a P2P distributed storage system. IPFS provides availability, higher performance and integrity (as files are referenced by their hash). The IPFS hash can be saved suitably on the blockchain. IPFS is available as a public P2P network, but you can also deploy your own private IPFS network.
If using IPFS, I also recommend using IPFS-Cluster over IPFS, to manage persistence and replication.
https://ipfs.io/
https://cluster.ipfs.io/
If you need encryption (maybe you don't), you should implement it outside IPFS (in your clients). How do you implement it, is up to you and your use case. As #GraphicalDot suggests in its comment, you can encrypt your file via AES and store the key in the blockchain (encrypted in turn via ECIES or ECDH) if you need user-level encryption (although if have its drawbacks if your user enrolls new keys). Anyway, Fabric itself only provides privacy at organization level (not at user level).
I think the answer also comes down to the business case of why you want to store images on chain. If you are trying to establish provenance, you can do that with metadata stored on chain and the images off chain, etc. If you want to make sure the images are stored for immutability, then maybe, but if you want to be able to delete them or modify them down the road, then maybe not.
We are currently researching on Hyperledger Fabric and from the document we know that a private data collection can be set up among some subset of organizations. There would be a private state DB (aka. side DB) on each of these organizations and per my understanding, the side DB is just like a normal state DB which normally adopts CouchDB.
One of our main requirements is that we have to distribute files (e.g. PDFs) among some subset of the peers. Each file has to be disseminated and stored at the related peers, so a centralized storage like AWS S3 or other cloud storage / server storage is not acceptable. As the file maybe large, the physical copies must be stored and disseminate off-chain. The transaction block may only store the hash of these documents.
My idea is that we may make use of the private data collection and the side DB. The physical files can be stored in the side DB (maybe in the form of base64string?) and can be distributed via Gossip Protocol (a P2P protocol) which is a feature in Hyperledger Fabric. The hash of the document along with other transaction details can be stored in a block as usual. As they are all native features by Hyperledger Fabric, I expect the transfer of the files via Gossip Protocol and the creation of the corresponding block will be in-sync.
My question is:
Is this way feasible to achieve the requirement? (Distribution of the files to different peers while creating a new block) I kinda feel like it is hacky.
Is this a good way / practice to achieve what we want? I have been doing research but I cannot find any implementation similar to this.
Most of the tutorial I found online pre-assumes that the files can be stored in a single centralized storage like cloud or some sort of servers, while our requirement demands a distribution of the files as well. Is my idea described above acceptable and feasible? We are very new to Blockchain and any advice is appreciated!
Is this way feasible to achieve the requirement? (Distribution of the files to different peers while creating a new block) I kinda feel like it is hacky.
So the workflow of private data distribution is that the orderer bundles the private data transaction containing only a hash to verify the data to a new block. So you dont have to do a workaround for this since private data provides this per default. The data itself gets distributed between authorized peers via gossip data dissemination protocol.
Is this a good way / practice to achieve what we want? I have been doing research but I cannot find any implementation similar to this.
Yes and no. Sry to say so. But this depends on your file sizes and amount. Fabric is capable of providing rly high throughput. I would test things out and see if it meets my requirements.
The other approach would be to do a work around and use IPFS (a p2p file system). You can read more about that approach here here
And here is an article discussing storing 'larger files' on chain. Maybe this gives some constructive insights aswell. But keep in mind this is an older article.
Check out IBM Blockchain Document Store, it is the implementation of storing any document (pdf or otherwise) both on and off chain. It has been done.
And while the implementation isn't publicly available, there is vast documentation on it's usage, can probably disseminate some information from it
I was reading lot of articles on blockchain and almost everyone has some different understanding of blockchain.
Is there any accepted definition of Blockchain by any community?
In few articles I read:
Blockchains are Decentralised while DLTs are not.
All blockchains are DLT but not all DLTs are blockchain. It added "if transactions get stored in blocks then it is a blockchain else it is not"
From above statements-
Is Decentralisation must for a Blockchain?
Or it is just a Immutable Distributed Database? which can be centralised or decentralised?
I would refer you to the first blockchain application, Bitcoin, which describes the "chain of blocks" in the original Bitcoin Whitepaper.
Blockchain
The term “blockchain” is often overused, that can have different meanings in different contexts. Blockchain technology has 3 major components that together really make it an innovation. Strictly speaking, a blockchain is just a data structure similar to a linked list. Blocks of data reference their previous block by including their digital fingerprint or hash in their block of data. If a previous block is modified, then all the following hashes will be different and it is easy to detect if the data has been tampered with. Even more importantly, is that this establishes an order to when events took place, in the case of Bitcoin, these events are transactions. The final piece is a consensus mechanism that allows participants on a publicly distributed network to all agree on a chain of blocks.
Consensus
A consensus mechanism extends the blockchain data structure by providing rules (agreed to by network participants) that enforce how blocks are accepted by the network as a whole. For example, with the proof-of-work consensus, there is an agreed-upon amount of work that must be done before a block is accepted as valid (its hash must meet a maximum value threshold). The lower the threshold, the more work must have been done (on average) to calculate the block hash. Providing a valid block hash becomes proof of work. This can make it much more difficult to modify past blocks, as the same amount of work must be done in order for the network to accept it as valid, thus distributed consensus can be achieved. This is why “blockchain technology” was invented, to achieve distributed consensus without relying on a third party. “Blockchain technology” is not really that interesting without the proof-of-work component and so it depends on what your definition of “blockchain technology” happens to be.
In conclusion, by this definition of blockchain, it doesn't make sense to use blockchain in a centralized environment (it is for distributed consensus).
One of the best writings on centralization and decentralization is an article by Ethereum creator Vitalik Buterin.
https://medium.com/#VitalikButerin/the-meaning-of-decentralization-a0c92b76a274
In this article he explains how even the most decentralized systems are also centralized to some extent.
Let's firstly make it clear that what is decentralization.
Decentralisation is the process by which the activities of an organization, particularly those regarding planning and decision making, are distributed or delegated away from a
central, authoritative location or group. Concepts of decentralization have been
applied to group dynamics and management science in private businesses and organizations,
political science, law and public administration, economics, money and technology. -- WikiPedia
So basically, there's no central, authoritative person or group or whatever to make decisions or plans in a decentralized system. According to this, I personally would prefer to separate it into two essential levels of a blockchain system.
In terms of the blockchain system(software and hardware) itself, it is decentralized. Because it is based on a distributed computer system, and it also has consensus algorithm (POW, POS, DAG etc.) to make sure each node/party in the system can reach to consistency.
In terms of governance, it can be decentralized or centralized. A typical centralized blockchain system is the private blockchain, since it is governed/managed by a single group or organization. While, obviously, a public blockchain is decentralized on the other hand.
Blockchain is a distributed ledger of Information. The main Concept was taken from the paper of Digital Time Stamping by Stuart Harber. The idea behind blockchain is to remove the centralized System so the decission power is taken from single Entity. Now if you want to make it centralised you are basically doing a trade off of the Basic properties of blockchain.
If you make the blockchain centralised one of the key feature is the consenses where other nodes verify the trnsactions so in centralised System the whole work is done by one node and we can assure that if that node is trustworthy node.
Block-chain is a distributed, decentralized, public ledger.
The title practically says everything.
In a near future I'm going to implement a real-time tracking system using possibly, a blockchain, and for certain reasons Hyperledger Fabric seems to be the chosen technology. After the information is recorded, it should be accessible in a map in a web application.
So the question is: If we save every one or two minutes the location of a truck to the blockchain using gps, it will hurt the general performance of the blockchain in a near future? (milions and milions of registers)
In the end I have to decide if I should save this information in the blockchain or, knowing that it would cause some serious issues, leave that information out of it and use an hybrid system with a classic database for that and a blockchain for other functionalities that won't cause performance issues.
Thanks.
There is no storage limitation on the Fabric Ledger other than the disk space. The current value of a key (say, the latest position of the truck)can be read via Fabric query which are saved in the world state for quick retrieval. There is also mechanism to look up the history of a key quickly via the historyDB that Fabric maintains.
Im just wondering what are the limitations of Hyperledger Fabric in terms of how much data can be stored on each of the peers?
Following this question I'm wondering what are you options in managing large amounts of data on a Hyperledger network i.e. with decentralised networks etc.
I'm struggling to find good resources on this so it would be great if somebody could fill me in or point me to some good resources on the topic!
Your question is too broad. However, I will cut down a few things and try to answer this as concisely as possible.
The amount of data that can be stored is although a direct function of the peer's capacity, however storing large amounts of data on the ledger is itself not recommended in case of any ledger, because scaling is an issue. You can opt for off chain storage like IPFS to store bulky data off-chain and have the proof of the same as a hash on the ledger. Next, you can try segregating data into channels and have peers join the channel on a need basis.
Moreover, have your data properly indexed (in CouchDB) and consider for replication and scalability aspect of the couch DB itself.
I hope I was able to touch the iceberg. For more details, you can reach me offline for further discussions.