P2P network design - p2p

I'm attempting to design a P2P network where all peers share the same data, as well as make changes to it. Without getting into the consensus portion (ie assume that only one node will make changes to the data at one time).
How would I make sure that all peers are connected to other peers in a fault tolerant way? I can only think of one way, and that is that each peer can request more peers from another peer, but how do I make sure that connections are distributed as evenly as possible, without one peer overloading on TCP connections, while another peer might barely have any connections? Or even how can I prevent all peers splitting into two separate groups?

Something like bittorrent's canonical peer priority (github link) which calculates a preference-order by hashing identifiers of both endpoints together should allow nodes to reach a pseudo-random layout of the overlay while avoiding nodes being "left out". Hashing both identities together results in a different but globally agreed on ordering from the perspective of each peer, thus constructing a randomized layout. As the number of edges per node increases the chance of the network splitting will rapidly go towards zero.
And you can put a limit on the number of connections that each peer accepts, that will force others to go look elsewhere once it is saturated.

Related

Implementing peer discovery in libp2p

Is peer discovery in libp2p (e.g. peers telling each other about peers they know about, and managing lists of connected nodes) in Rust controlled entirely at the level of a NetworkBehavior?
It looks like one option is to use Kademlia which looks like it does this (in the rust version) by defining a NetworkBehavior.
Is it correct that if you don't want to use Kademlia to implement peer discovery, you do this by defining peer discovery as part of your NetworkBehavior?
I'm trying to avoid a situation whereby I start to implement code to do this, but then I find that libp2p is actually doing this for me under the covers.
You have several alternatives, but of course you have to implement a behavior (or combination of behaviors) to discover peers:
mDNS
It allows peers to discover each other when they are on the same local network without any configuration. It is obviously the simplest discovery mode, but limited to local networks.
This is the example.
Rendezvous
It's goal is to provide a lightweight mechanism for generalized peer discovery. As its name indicates, it requires that there be nodes that act as rendezvous. In the protocol implementation examples you can see it better.
Kademlia
This is the best option in the context of a network with many nodes, where a portion of these nodes may offer limited connectivity. It's simpler than it seems, but at the time we did not find practical examples, and we learned through trial and error.
Some of my colleagues are preparing a tutorial series to be published soon, to share our experience with libp2p in Rust.

Simulate Hyperledger Fabric network with 5000 users

I am new to Hyperledger Fabric.
I read its documentation and followed the test network they provided on their website, so the test-network provides a bunch of terminal commands to add a third organization and its peer. I like that everything is ready to run on terminals, but the problem is the high level of abstraction over many details.
Goal:
I would like to simulate a permissioned blockchain network with 5000 users. Each user should be able to broadcast a transaction in every 15 seconds to the channel. The orderers should package these transactions in every 15 seconds and let the connected users verify new blocks.
Questions:
Should I create a new peer for each user?
Or can I use a single peer and let each user use the app?
I could not find a single tutorial on adding more peers dynamically.
Reading the documentation, I think I should let each user have his own peer and app to broadcast transactions. However, creating 5000 peers(one-by-one) would be very time-consuming.
I know these questions may sound naive, considering my other options like creating my blockchain network simulation using socketio or grpc would be less painful at the moment. I don't really want to avoid reading the docs of HLF, but the high-level of abstraction and the learning-time make me wonder, I should better use the other options for my simulation. As Linus Torvalds puts it simply:
Talk is cheap, show me the code!
In HLF case, I don't want the already-provided terminal commands, I want to really understand and modify the source code of peers.
Thank you for any recommendation or direction.
You need 5000 users (as registered in the CA), not 5000 peers. A single peer should be enough (although some more peers can be useful to distribute the endorsements and improve performance).
So, you should:
Register 5000 users in your Fabric-CA
Enroll their cryptographic material from the Fabric-CA
Run the 5000 clients (peer command, Fabric SDK based application or whatever).
Fabric CA related stuff: https://hyperledger-fabric-ca.readthedocs.io/en/latest/deployguide/use_CA.html.
Obviously, you should prepare some kind of script to do that. Don't do it manually.
If your purpose is only testing, maybe you can use cryptogen instead of Fabric-CA.
It seems that you are trying to perform some kind of performance test. Hyperledger Caliper is designed for these kind of tests. Maybe you can configure Caliper with 5000 workers (although I'm not sure if you can configure less than 1 TPS to simulate your request every 15 seconds).
About the orderers, you can configure your ordering service with a batch time of 15 seconds, but take into account that your 5000 transactions every 15 seconds may reach the batch size before that happens, so the block is generated before.

Hyperledger-fabric use cases

I'm currently looking to securely replicate hundreds of Gbs of data across a few hundred hosts. I was looking at hyperledger-fabric private blockchain because of its use of TLS and peer to peer gossip protocol for data transmission, plus of course the security of the blockchain itself.
Is it reasonable for me to be considering using blockchain as a way to securely do data replication? I have not seen this in any blockchain use case, but from what I've read it seems reasonable even though everything I've read seems to indicate storing data in the blockchain is a bad idea. Usually the arguments are that it costs too much and the data has to be replicated across all the peers in the system. Cost isn't a concern in this case because its a private blockchain and for my use case the data replication (if it can be done efficiently) is what I'm looking for.
I could use ipfs, swift, S3, etc. to store the data, but that would add operational burden, especially if hyperledger-fabric can do the job on its own.
Also, if I use hyperledger private data collections, how much control over purging do I have? For my use cases, I can't just purge the oldest data as in some cases older data needs to be preserved for a long time and in some cases newer data can be purged fairly quickly.
On the subject of data replication:
TL;DR; Not a blockchain solution
Here's my thinking behind that.
Storing large amounts of data isn't a good idea as you've mentioned. Yes there's the replication side of the data across. (but that's a side-effect needed in this case). But also there's the signing and validation etc that nees to take place across all that data. So the costs in terms of processing would mean it would inefficient.
Definition of securely.. You don't say what quality of service would constitute 'secure'. For example
Access Control for users to access the data?
Assurance that the data has been replicated and is on disk at remote locations without corruption?
Encryption of data to protect it in transit and at rest.
Blockchain, and I'm thinking Hyperledger Fabric here, would offer you the assurance. But there's no encryption in transit, you'd need to add that. And access control, the primitives are there but required you to implement and use them.
I would tend to think of the use of Blockchain in this scenario would be to provide the audit trail of how the data was replicated between hosts, with some other protocol.
On the subject of private data collection purging:
Currently this is implemented by purging data when the peer reaches a certain block height. i.e. purge after 42 blocks. But we're working on a feature to allow 'purge-on-demand' based on a call from the chaincode.

Node.js built-in distributed cache

I'm looking for something for distributed cache for node.js. It should be:
built-in (works inside node process)
distributed and replicated between any number of servers (1 or 2 should be enough for concensus)
consistency is not critical
automatic peer discovery is preferably (e.g. by UDP)
should have one lider in one period of time (who will periodically update the cache)
I tried to find something for LevelDB (memdown), but all project are not maintained for several years. Maybe they are so stable, but I'm not sure. It is also possible to take a discovery and consensus algorithm and implement other things manually, but I can't find js-implementaion of Raft etc. for 1+ peers.
Advise me something, please.
I've chosen http://cote.js.org/ - mesh network. This is not full decision, just discovery and pub sub service. Plus lider choosing algorithm which I wrote manually.

Hyperledger participation without hosting a peer node

We are looking to implement a hyperledger-fabric solution and I'm stumped by this fundamental question. How is a hyperledger solution architected if not all participants are able/willing to host a peer node?
Our users are divided into 2 groups - payers and providers. Most of our providers are willing and have the IT infrastructure required to host a peer node. Many of our payers are/do not.
From the perspective of a payer participant how can I trust the system if I'm not a peer and don't have my own copy of the ledger? What options might we have in setting up a hyperledger environment that allows them to participate?
Apologies if I have missed the point or some documentation that describes this scenario but links to it would be most welcome.
The easiest "trust assumption" is for groups which don't run peers to trust a specific member who is running a peer. For submitting transactions, it really does not matter if you run a node or not ... you would likely care about the endorsement policy in effect to make sure that there is not one all powerful member with a peer, but other than that you submit to multiple peers for endorsement anyway. For querying data, as mentioned you might have affinity / trust for one particular member, you might select a random or majority set of peers and do a "strong read". A query is still an invoke so you can actually query multiple peers in the same call.

Resources