Hyperledger Fabric Orderer - TLS Handshake Bad Certificate Issue - hyperledger-fabric

I'm developing an insurance application project that utilizes a hyperledger fabric network. I currently have a problem where my orderer nodes do not stay online for more than about 10 seconds before they crash. Inspecting the logs, there are a multitude of error messages suggesting that TLS certificates are not agreeing. The error messages do not seem to specify exactly what certificates are the faulting certificates in question however further up the logs is an error that says it could not match an expected certificate with a certificate that it found instead (Shown in this screenshot). While this error was also vague, I deduced that it was in fact comparing the orderer's public key certificate with the certificate within the locally stored genesis block. Upon inspection of the genesis block in the orderer node, it is indeed a completely different certificate. I have noticed that even after destroying the whole network from Docker and re-building the network, the certificate inside the genesis block of which is stored in the orderer nodes always remains exactly the same.
In terms of my network layout, I have 2 organizations. One for my orderer(s) and one for my peers. I also have 4 CA Servers (A CA and TLS CA Server for both the orderer organization and peer organization).
I have 3 peer nodes and 3 orderer nodes.
Included below is a pastebin of the logs from orderer1 node before it crashes, and the GitHub repo
orderer1 logs - ``https://pastebin.com/AYcpAKHn
repo - ``https://github.com/Ewan99/InsurApp/tree/remake
NOTE: When running the app, first run ./destroy.sh, then ./build.sh
"iaorderer" is the org for my orderers
"iapeer" is the org for my peers
I've tried re-naming certificates incase they were being over-written by each other on creation.
I tried reducing down from 3 orderers to 1 to see if it made any differences.
Of course, in going from 3 to 1 orderers I changed from RAFT to solo, and still encountered the same problems

As per david_k's comment suggestion:
Looks like you are using persistent volumes in docker, so do you prune these volumes when you start again from scatch if you don't then you can pick up data that doesn't match newly created crypto material
I had to prune my old docker volumes as they were conflicting and providing newly built containers with older certificates

I looked at the docker-compose.yaml file and there is something there that by all accounts should create a failure. Each peer definition uses the same port e.g.
CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer1.iapeer.com:7051
CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer2.iapeer.com:7051
CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer3.iapeer.com:7051
To my mind, this cannot possibly work while running on a single server, but perhaps I am missing something.

Related

Expected behavior of peer in case of all orderers down

In Hyperledger Fabric, what is the expected behavior of peer when all orderer nodes are down.
Should peer also down, or stop serving request from client, or continue to serve query request?
In our test, after orderers are stopped, the peer keeps writing "failed to create connection to orderer" log. When we query a key by calling chaincode the value is returned.
Can you help clarify if this is expected behavior. Thank you.
I am working on a distributed hyperledger fabric network. I would recommend the Orderer Raft Consensus https://hyperledger-fabric.readthedocs.io/en/release-2.2/orderer/ordering_service.html#ordering-service-implementations.
I have solved this in such a way that in my case I have three orderers that run independently on different environments.
If I crash all these orderers, the peer containers will continue to run on the other participants of the network. As you said, they cannot make any transactions.
If one of my orderers crashes it is not so bad after the raft consensus, the containers keep running. If another one fails, no transactions can be made. In this case I let the peers continue and check if the orderers are available again.
The behaviour you described I would put down to the fact that the peer requests the value from his ledger, he doesn't need an orderer for that. https://hyperledger.github.io/fabric-chaincode-node/master/api/fabric-shim.ChaincodeStub.html#getState
Have a read of this: https://github.com/hyperledger/fabric/blob/master/docs/source/peers/peers.md This is the best documentation for how the system works I've found and there's more in the docs directory on the repo for orderers, etc.
My understanding is: The peers are there to sign (endorse) transaction proposals. The orderer exists to order, validate, package and distribute transactions to peers. The peers can also distribute their knowledge of validated transactions via the gossip channel.
If all orderers go down, the transactions will not be validated/packaged/distributed so the blockchain will be out of action until the orderers are restored.
When we query a key by calling chaincode the value is returned.
Peers will still remain up and ready to sign/endorse transaction proposals, and querying the blockchain held at the peers will still work. Chaincodes are hosted by the peers. Orderers do not host chaincode.
Also see here https://github.com/hyperledger/fabric/blob/master/docs/source/orderer/ordering_service.md#ordering-service-implementations for the various modes the orderer can be run in: Raft mode, Kafka ordering, Solo ordering.
I think the current observerd behavour is expected and in my view it is just fine.
Let's check the purpose of orderer?
Order the transactions
Cut the block and distribute the block amongst the orgs when the criteria is met ( min txn/size or time).
This also means, orderer is needed when your Fabric network is processing those transactions which intend to write data into the ledger, isnt it? And Query is not a transaction which writes into the ledger. So it doesn't need the orderer. For query, it will pick up the data from the peer's local database.
So I think what could be done is, to send out an alert to the production support when your application detect orderer node down ( with some health check ? ). And your application displays a dimnished capacity/limited operations message while work on bringing up the orderer network, the system can still serve the search queries.
From my view, its just fantastic. But its finally upto you. Cheers!!

Removing majority of orderers from OrdererAddresses section in channel in Hyperledger Fabric

Consider the following situation:
I am running fabric-samples/first-network for the HLF in RAFT mode.
I use the CLI container to fetch the latest block for mychannel and edit the OrdererAddresses section by removing 4 orderers namely, orderer2.example.com, orderer3.example.com, orderer4.example.com, orderer5.example.com from it.
I am assuming this should disturb RAFT protocol because orderers are meant to communicate with one another by looking at the endpoints in OrdererAddresses section.
Now, the issue is that, despite the above fact, the RAFT keeps working fine. I wait for 10 minutes, assuming RAFT would break after EvictionSuspicion timeout as no longer leader can comunicate to other orderers. But this does not happen. I am still able to read blocks from the mychannel as well as I am able to submit new transactions (invoke operations) on the chaincode on that channel.
This means that OrdererAddresses are not looked into while communicating. Please correct me if I am wrong. By this, I need to know:
What is the exact functionality of OrdererAddresses section in the RAFT channels?
I learnt that RAFT orderers communicate to one another using the
host and port properties of the Consenters section for the
purpose of consensus messages. The endpoints present in the
OrdererAddresses section are used for replication of blocks.
The learning could be verified from here as answered by Yacov M.

Default time for orderers to detect change in their endpoints in system channel

I was trying to migrate my Hyperledger Fabric network (running a RAFT ordering service) from one host to another.
In this process, I was making sure that the TLS communication is respected, which means that I made required changes in the system channel before migration process. I used the backup and genesis block (of old ordering service) to restore the network on target host. One new thing that I found was that when the orderer nodes started at new host, it took 10 minutes for them to sync blocks and start the RAFT election.
The question is: Is this default time configured in the orderer code-base or is it some other functionality?
NOTE: I know the that addition of an existing orderer node in some application channel takes 5 minutes by default for that orderer to detect the change. So, is the above situation something similar to this or is a different capability?
The complete orderer node (one that was started first on new host) logs can be found here.
Eviction suspicion is a mechanism which triggers after a default timeout of 10 minutes.

Does Orderer have Block(Ledger) data?

I built hyperledger fabric network using Kafka-based Ordering Service.
I thought that Orderer doesn't have Block data.
But, when I checked /var/hyperledger/production/orderer/chains/mychannel in Orderer server, I found blockfile_000000 file.
I checked this file using "less" command.
Then, I found key-value data which I registered by invoking chaincode.
What is this file?
This means that Orderer also maintain Block data(i.e. Ledger)?
The orderer has the blockchain of all channels it is part of.
However, it doesn't have the world state and it doesn't inspect the content of the transactions.
It just writes the blocks into the disk, to serve peers that pull the blocks from it.

HLF v1.0 node.js fabric-client and orderers availability

I have following fabric network topology: two orgs with two peers and two orderers per organisation (along with required kafka/zookeepers).
Q: How to setup the node fabric-client to protect my app against failure of the single orderer?
The documentation says that I can add multiple orderers to the list using channel.addOrderer(orderer), but it also says that
"SDK uses only first orderer from the list"
so, my understanding is that failure of the first orderer from the list will prevent the processing of the subsequent transactions - am I right?
You are correct although you can easily rectify this situation. If you get a failure from sendTransaction which is related to that orderer node being down (e.g. SERVICE_UNAVAILABLE), you can use the removeOrderer method remove the orderer and then call sendTransaction again (as it will now use whatever the next orderer in the list was). You can also use addOrderer to add the orderer you removed back to the end of the list as well.
The version v1.2.0 of Node SDK, already includes this feature where the channel may have multiple orderers, and the sendTransaction API should try the first one and then the next and so on until it succeeds in sending the transaction.

Resources