I have a 3 org 6 peer system with node SDK and 5 raft orderers. The raft is working fine, tried killing leaders and election takes place. The SDK also working well can invoke transactions. But the problem bothering me is while starting the network the ordering system is defaulting to the first orderer say for example, orderer1.example.com and now if I kill this first orderer the network fails, invocation of transaction fails while raft selects a new leader. When I try to invoke a transaction it shows connection failed, cannot connect to all addresses and service unavailable.
I see in the typescript section of the SDk a way of passing the orderers and there I wrote a loop to pass in all orderers and the above problem is solved.
Is there any way to resolve this in the js implementation ?
Hey #Anantha Padmanabhan
This is nothing to do with the ordering system and raft is a perfect distributed consensus algorithm
In your case when 5 orderers are present you tried to kill one and the remaining 4 will start leader election if 5th one is leader and your network is stable no worries.
The problem is in your SDK side in the connection profile
For Example:
"channels": {
"samchannel": {
"orderers": [
"sam-orderer1",
"sam-orderer2",
"sam-orderer3",
"sam-orderer4",
"sam-orderer5"
],
...
If you try to remove sam-orderer1 orderer then your SDK try to send the transaction to the sam-orderer1 since it is in the 0th index of the array
Test: try to remove other than sam-orderer1 for example sam-orderer3 and now try to invoke the transaction and it will still work
Do this test and update me the status of the test
this is coming from the SDK side, as soon as it detects any orderer is down then it is stoping the execution instead it should redirect to another available orderer. I think the only way is instead of SDK resolves the orderers automatically using connection profile, u can take this step and send only the available orderer and available orderers can be provided using discovery service
Related
In Hyperledger Fabric, what is the expected behavior of peer when all orderer nodes are down.
Should peer also down, or stop serving request from client, or continue to serve query request?
In our test, after orderers are stopped, the peer keeps writing "failed to create connection to orderer" log. When we query a key by calling chaincode the value is returned.
Can you help clarify if this is expected behavior. Thank you.
I am working on a distributed hyperledger fabric network. I would recommend the Orderer Raft Consensus https://hyperledger-fabric.readthedocs.io/en/release-2.2/orderer/ordering_service.html#ordering-service-implementations.
I have solved this in such a way that in my case I have three orderers that run independently on different environments.
If I crash all these orderers, the peer containers will continue to run on the other participants of the network. As you said, they cannot make any transactions.
If one of my orderers crashes it is not so bad after the raft consensus, the containers keep running. If another one fails, no transactions can be made. In this case I let the peers continue and check if the orderers are available again.
The behaviour you described I would put down to the fact that the peer requests the value from his ledger, he doesn't need an orderer for that. https://hyperledger.github.io/fabric-chaincode-node/master/api/fabric-shim.ChaincodeStub.html#getState
Have a read of this: https://github.com/hyperledger/fabric/blob/master/docs/source/peers/peers.md This is the best documentation for how the system works I've found and there's more in the docs directory on the repo for orderers, etc.
My understanding is: The peers are there to sign (endorse) transaction proposals. The orderer exists to order, validate, package and distribute transactions to peers. The peers can also distribute their knowledge of validated transactions via the gossip channel.
If all orderers go down, the transactions will not be validated/packaged/distributed so the blockchain will be out of action until the orderers are restored.
When we query a key by calling chaincode the value is returned.
Peers will still remain up and ready to sign/endorse transaction proposals, and querying the blockchain held at the peers will still work. Chaincodes are hosted by the peers. Orderers do not host chaincode.
Also see here https://github.com/hyperledger/fabric/blob/master/docs/source/orderer/ordering_service.md#ordering-service-implementations for the various modes the orderer can be run in: Raft mode, Kafka ordering, Solo ordering.
I think the current observerd behavour is expected and in my view it is just fine.
Let's check the purpose of orderer?
Order the transactions
Cut the block and distribute the block amongst the orgs when the criteria is met ( min txn/size or time).
This also means, orderer is needed when your Fabric network is processing those transactions which intend to write data into the ledger, isnt it? And Query is not a transaction which writes into the ledger. So it doesn't need the orderer. For query, it will pick up the data from the peer's local database.
So I think what could be done is, to send out an alert to the production support when your application detect orderer node down ( with some health check ? ). And your application displays a dimnished capacity/limited operations message while work on bringing up the orderer network, the system can still serve the search queries.
From my view, its just fantastic. But its finally upto you. Cheers!!
Consider the following situation:
I am running fabric-samples/first-network for the HLF in RAFT mode.
I use the CLI container to fetch the latest block for mychannel and edit the OrdererAddresses section by removing 4 orderers namely, orderer2.example.com, orderer3.example.com, orderer4.example.com, orderer5.example.com from it.
I am assuming this should disturb RAFT protocol because orderers are meant to communicate with one another by looking at the endpoints in OrdererAddresses section.
Now, the issue is that, despite the above fact, the RAFT keeps working fine. I wait for 10 minutes, assuming RAFT would break after EvictionSuspicion timeout as no longer leader can comunicate to other orderers. But this does not happen. I am still able to read blocks from the mychannel as well as I am able to submit new transactions (invoke operations) on the chaincode on that channel.
This means that OrdererAddresses are not looked into while communicating. Please correct me if I am wrong. By this, I need to know:
What is the exact functionality of OrdererAddresses section in the RAFT channels?
I learnt that RAFT orderers communicate to one another using the
host and port properties of the Consenters section for the
purpose of consensus messages. The endpoints present in the
OrdererAddresses section are used for replication of blocks.
The learning could be verified from here as answered by Yacov M.
I was trying to migrate my Hyperledger Fabric network (running a RAFT ordering service) from one host to another.
In this process, I was making sure that the TLS communication is respected, which means that I made required changes in the system channel before migration process. I used the backup and genesis block (of old ordering service) to restore the network on target host. One new thing that I found was that when the orderer nodes started at new host, it took 10 minutes for them to sync blocks and start the RAFT election.
The question is: Is this default time configured in the orderer code-base or is it some other functionality?
NOTE: I know the that addition of an existing orderer node in some application channel takes 5 minutes by default for that orderer to detect the change. So, is the above situation something similar to this or is a different capability?
The complete orderer node (one that was started first on new host) logs can be found here.
Eviction suspicion is a mechanism which triggers after a default timeout of 10 minutes.
I have following fabric network topology: two orgs with two peers and two orderers per organisation (along with required kafka/zookeepers).
Q: How to setup the node fabric-client to protect my app against failure of the single orderer?
The documentation says that I can add multiple orderers to the list using channel.addOrderer(orderer), but it also says that
"SDK uses only first orderer from the list"
so, my understanding is that failure of the first orderer from the list will prevent the processing of the subsequent transactions - am I right?
You are correct although you can easily rectify this situation. If you get a failure from sendTransaction which is related to that orderer node being down (e.g. SERVICE_UNAVAILABLE), you can use the removeOrderer method remove the orderer and then call sendTransaction again (as it will now use whatever the next orderer in the list was). You can also use addOrderer to add the orderer you removed back to the end of the list as well.
The version v1.2.0 of Node SDK, already includes this feature where the channel may have multiple orderers, and the sendTransaction API should try the first one and then the next and so on until it succeeds in sending the transaction.
According to the docs, a leading peer node communicates with orderer nodes on behalf of a member. This means, as I understood, two functions of orderers, which are <broadcast> and <deliver>, should be invoked by a leading peer. However, some parts of docs seems to say a submitting-client (a.k.a application with SDK) can invoke <broadcast> of Orderers directly and, at the same time, a submitting-client should have at least one peer to participate in the blockchain network.
So, does it means that a submitting client invoke <broadcast> function of Orderers through a leading peer? Or does a submitting-client can directly invoke <broadcast> function of orderers without passing through any peer node?
Submitting client do able to directly invoke the broadcast API of ordering service, one of the most obvious examples as the channel creation, where client has to submit configuration transaction (using broadcast) to the ordering service and get the genesis block (using deliver). Of course you need to make sure client has valid permissions to do it.