Can’t setup yb-master with TLS encryption in YugabyteDB

Can’t setup yb-master with TLS encryption in YugabyteDB - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
I am trying to setup yb-master nodes with TLS encryption. I followed the doc and applied the configurations accordingly. I am receiving the following error:
W0113 18:45:22.847903 28992 universe_key_client.cc:60] Rpc status: Network error (yb/rpc/secure_stream.cc:562): Handshake failed: Network error (yb/rpc/secure_stream.cc:882): Endpoint does not match, address: ip2, hostname: ip2, resp:
W0113 18:45:22.851953 28981 tcp_stream.cc:144] { local: ip1:7100 remote: ip3:59128 }: Shutting down with pending inbound data ({ capacity: 374400 pos: 0 size: 67 }, status = Network error (yb/rpc/secure_stream.cc:472): Insecure connection header: 5942)
W0113 18:45:22.852005 28981 tcp_stream.cc:144] { local: ip1:7100 remote: ip3:59128 }: Shutting down with pending inbound data ({ capacity: 374400 pos: 0 size: 67 }, status = Service unavailable (yb/rpc/reactor.cc:100): Shutdown connection (system error 108))
From what I see, the endpoints actually match. Anyone has idea on this?

Do you have that ip address in your cert? Do you connect over that ip? You need to have the ip address in the cert. It should work after updating the certs.

Related

rafthttp: dial tcp timeout on etcd 3-node cluster creation

I don't have an access to the etcd part of the project's source code, however I do have access to the /var/log/syslog.
The goal is to setup up 3-node cluster.
(1)The very first etcd error that comes up is:
rafthttp: failed to dial 76e7ffhh20007a98 on stream MsgApp v2 (dial tcp 10.0.0.134:2380: i/o timeout)
Before continuing, I would say that I can ping all three nodes from each of the nodes. As well as I have tried to open the 2380 TCP ports and still no success - same error.
(2)So, before that error I had following messages from the etcd, which in my opinion confirm that cluster is setup correctly:
etcdserver/membership: added member 76e7ffhh20007a98 [https://server2:2380]
etcdserver/membership: added member 222e88db3803e816 [https://server1:2380]
etcdserver/membership: added member 999115e00e17123d [https://server3:2380]
In /etc/hosts file these DNS names are resolved as:
server2 10.0.0.135
server1 10.0.0.134
server3 10.0.0.136
(3)The initial setup, however (on each nodes looks like this):
embed: listening for peers on https://127.0.0.1:2380
embed: listening for client requests on 127.0.0.1:2379
So, to sum up, each node have got this initial setup log (3) and then adds members (2) then once these steps are done it fails with (1). As I know the etcd cluster creation is following this pattern: https://etcd.io/docs/v3.5/tutorials/how-to-setup-cluster/
Without knowing the source code is really hard to debug, however maybe some ideas on the error and what could cause it?
UPD: etcdctl cluster-health output (ETCDCTL_ENDPOINT is exported):
cluster may be unhealthy: failed to list members Error: client: etcd
cluster is unavailable or misconfigured; error #0: client: endpoint
http://127.0.0.1:2379 exceeded header timeout ; error #1: dial tcp
127.0.0.1:4001: connect: connection refused
error #0: client: endpoint http://127.0.0.1:2379 exceeded header
timeout error #1: dial tcp 127.0.0.1:4001: connect: connection refused

Istio Multicluster primary-remote on Azure AKS

I am trying to create multi cluster istio primary-remote.
First created two clusters AZURE AKS. Used AzureCNI for Network Configuaration and following are the settings of the cluster.
First cluster
vnet istioclusterone - 10.10.0.0/20
subnet default - 10.10.0.0/20
k8s service address range 10.100.0.0/16
DNS service ip - 10.100.0.10
Docker Bridge address - 172.17.0.1/16
DNS-prefix - app-cluster-dns
Second cluster
vnet istioclusterone - 10.11.0.0/20
subnet default - 10.11.0.0/20
k8s service address range 10.101.0.0/16
DNS service ip - 10.101.0.10
Docker Bridge address - 172.18.0.1/16
DNS-prefix - processing-cluster-dns
Other than this gone with default settings.
Next Followed below articles to setup multi Istio cluster.
Before you begin
Primary-remote
last step in second article to setup cluster2 as remote is failed.
Found below errors in logs of istio-ingressgateway pod.
2022-04-11T07:51:00.352057Z warning envoy config StreamAggregatedResources gRPC config stream closed since 431s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:08.514428Z warning envoy config StreamAggregatedResources gRPC config stream closed since 439s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:12.462140Z warning envoy config StreamAggregatedResources gRPC config stream closed since 443s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:39.950935Z warning envoy config StreamAggregatedResources gRPC config stream closed since 471s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
Has anyone tried this scenario please share your insights.
Thanks.
Update:
Have used custom certs for both the clusters previous error was solved.
then created a gateway in both the clusters.
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: cluster-aware-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: tls
protocol: TLS
tls:
mode: AUTO_PASSTHROUGH
hosts:
- "*.local"
Now getting new error. check below logs of pod istio-ingressgateway-575ccb4d79 of cluster2.
2022-04-13T09:14:04.650502Z warning envoy config StreamAggregatedResources gRPC config stream closed since 60s ago: 14, connection error: desc = "transport: Error while dialing dial tcp <publicIPofEastWestgateway>:15012: i/o timeout"
2022-04-13T09:14:27.026016Z warning envoy config StreamAggregatedResources gRPC config stream closed since 83s ago: 14, connection error: desc = "transport: Error while dialing dial tcp <publicIPofEastWestgateway:15012: i/o timeout"
what I undertood here, I have an eastwestgateway installed in cluster1 as in the documentation linkToDoc
cluster2 is trying to access cluster1. using publicIp of eastwest-gateway on port 15012 which is failing.
checked security groups port is opened. Tried telnet from a test pod from within the cluster to check. its failing.
can anyone help me here.
Thanks

It looks like a firewall issue. not sure if it'll help, but try opening the ports 15012 and 15443 on the remote cluster's outbound, to the eastwestgateway elb ip (primary cluster)

Kubernetes error [::1]:6443: connect: cannot assign requested address

I got the following error:
controller.go:228] unable to sync kubernetes service: Post "https://[::1]:6443/api/v1/namespaces": dial tcp [::1]:6443: connect: cannot assign requested address
I have the following warnings in my cluster kube (3x3 master/workers on prem (kvm)) with 3 etcd on masters.
kubectl get events --field-selector type!=Normal -n kube-system
LAST SEEN TYPE REASON OBJECT MESSAGE
3m25s Warning Unhealthy pod/kube-apiserver-kube-master-1 Readiness probe failed: HTTP probe failed with statuscode: 500
3m24s Warning Unhealthy pod/kube-apiserver-kube-master-2 Readiness probe failed: HTTP probe failed with statuscode: 500
3m25s Warning Unhealthy pod/kube-apiserver-kube-master-2 Liveness probe failed: HTTP probe failed with statuscode: 500
3m27s Warning Unhealthy pod/kube-apiserver-kube-master-3 Readiness probe failed: HTTP probe failed with statuscode: 500
17m Warning Unhealthy pod/kube-apiserver-kube-master-3 Liveness probe failed: HTTP probe failed with statuscode: 500
This error not affect my cluster or my servicies in any form. It's appear from the begining. How do I solve? :D

Somewhere you are assigning [::1] address to endpoints..
The endpoint IPs must not be: loopback (127.0.0.0/8 for IPv4, ::1/128
for IPv6), or link-local (169.254.0.0/16 and 224.0.0.0/24 for IPv4,
fe80::/64 for IPv6).
[::1] Means 127.0.0.1 in ipv6 address.

I had the same Error.
My CoWorker deactivated IPv6 (to try something..) and Kubernetes tried to use IPv6.
After rebooting my Master, IPv6 came back and it worked again.
I searched for a bit and found this article: https://kubernetes.io/blog/2021/12/08/dual-stack-networking-ga/ which basically says you can set ipFamilyPolicy to one of three options:
SingleStack
PreferDualStack
RequireDualStack

Debugging TLS handshake failure

I'm trying to access my peer through the fabric-network nodejs sdk.
However, I encounter an error during the gateway.connect in the sdk and the logs I find in the peer container is not helpful.
All I have, even with the grpc=debug logging mode is :
peer0.catie-test | 2020-09-21 13:27:07.731 UTC [core.comm] ServerHandshake -> ERRO 087 TLS handshake failed with error remote error: tls: handshake failure server=PeerServer remoteaddress=172.17.0.1:49918
peer0.catie-test | 2020-09-21 13:27:07.731 UTC [grpc] handleRawConn -> DEBU 088 grpc: Server.Serve failed to complete security handshake from "172.17.0.1:49918": remote error: tls: handshake failure
Any way to have more helpful logs ? I would like to know, for example, which key are used for the TLS handshake check.
Edit with more infos: Configuration files and TLS verification
My peer is configured with TLS with the env variables :
CORE_PEER_TLS_ENABLED=true
CORE_PEER_TLS_KEY_FILE=/etc/hyperledger/crypto/peer/tls-msp/keystore/key.pem
CORE_PEER_TLS_CERT_FILE=/etc/hyperledger/crypto/peer/tls-msp/signcerts/cert.pem
CORE_PEER_TLS_ROOTCERT_FILE=/etc/hyperledger/crypto/peer/tls-msp/tlscacerts/tlsca.catie-test-cert.pem
I have the correct tlscacert of my peer on the client side, because the output from the peer and from client side are the same:
cat /etc/hyperledger/crypto/peer/tls-msp/tlscacerts/tlsca.catie-test-cert.pem # From the peer, output ZTd/o8LLw== at the end
cat /tmp/fabric-start-catie-test/building/artifacts/peer0.catie-test-crypto/tls-msp/tlscacerts/tlsca.catie-test-cert.pem # From the client, output ZTd/o8LLw== at the end
Path to the peer tlscacert is filled in the client side connection-profile.json :
"peers": {
"peer0.catie-test": {
"tlsCACerts": {
"path": "/tmp/fabric-start-catie-test/building/artifacts/peer0.catie-test-crypto/tls-msp/tlscacerts/tlsca.catie-test-cert.pem"
},
"grpcOptions":{
"ssl-target-name-override": "172.17.0.7",
"grpc.keepalive_time_ms": 10000
},
"url": "grpcs://172.17.0.4:7051",
"eventUrl": "grpcs://172.17.0.4:7053"
}
}
And I also checked that the tlsCAcert is the one that generated my peer cert :
openssl verify -CAfile $CORE_PEER_TLS_ROOTCERT_FILE $CORE_PEER_TLS_CERT_FILE # Output : /etc/hyperledger/crypto/peer/tls-msp/signcerts/cert.pem: OK
Edit 2 : Grpc option, peer name instead of IP and client logs
Also tried adding the grpcOptions to the peer section of the connection-profile.json (see the updated paragraph above) but it didn't change anything.
Also tried to add peer name to my /etc/hosts to reach my peer via its name instead of its IP. It makes a warning disappear but don't solve my problem and I prefer to work with IPs in my scripts.
Here are the logs of the nodejs sdk client in case it helps to diagnose the problem, but it only says that the Endorser must be connected and I think it is, because it reaches my peer as I have this TLS error in my peer's logs.
(node:59350) [DEP0123] DeprecationWarning: Setting the TLS ServerName to an IP address is not permitted by RFC 6066. This will be ignored in a future version.
2020-09-23T06:42:20.704Z - error: [ServiceEndpoint]: Error: Failed to connect before the deadline on Endorser- name: peer0.catie-test, url:grpcs://172.17.0.7:7051, connected:false, connectAttempted:true
2020-09-23T06:42:20.705Z - error: [ServiceEndpoint]: waitForReady - Failed to connect to remote gRPC server peer0.catie-test url:grpcs://172.17.0.7:7051 timeout:3000
2020-09-23T06:42:20.708Z - error: [NetworkConfig]: buildPeer - Unable to connect to the endorser peer0.catie-test due to Error: Failed to connect before the deadline on Endorser- name: peer0.catie-test, url:grpcs://172.17.0.7:7051, connected:false, connectAttempted:true
at checkState (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/#grpc/grpc-js/build/src/client.js:69:26)
at Timeout._onTimeout (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/#grpc/grpc-js/build/src/channel.js:292:17)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7) {
connectFailed: true
}
(node:59350) UnhandledPromiseRejectionWarning: Error: Endorser must be connected
at Channel.addEndorser (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/fabric-common/lib/Channel.js:259:10)
at buildChannel (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/fabric-network/lib/impl/ccp/networkconfig.js:50:21)
at Object.loadFromConfig (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/fabric-network/lib/impl/ccp/networkconfig.js:34:19)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async Gateway.connect (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/fabric-network/lib/gateway.js:279:13)
at async queryChaincode (/home/rqueraud/CATIE/Myrmica/myrmica-start/test/chaincode-sdk/index.js:41:5)
at async /home/rqueraud/CATIE/Myrmica/myrmica-start/test/chaincode-sdk/index.js:57:5
Edit 3 : Docker IPs ? Trying with EC2 instances.
As #Urko mentionned, my nodes are in fact docker container running docker-in-docker (dind) images. Inside these containers are some other containers running the hyperledger peers, cli, ... images.
I access them from the host which is also where I run the fabric sdk nodejs client. I cannot access them via their container name, I think it is only possible in a docker-compose configuration, isn't it ? I already tried (see Edit 2 above) to add their name to my /etc/hosts to reach them via a name instead of an IP but it didn't change anything.
However, as my network startup is scripted, I deployed it using docker-machine in AWS instead of the dind docker containers this time, so these are real instances reachable on the internet. But I still encounter the same errors, here is the log from the peer where you can see this is coming from a public IP :
2020-09-24 08:32:57.653 UTC [core.comm] ServerHandshake -> ERRO 0d7 TLS handshake failed with error remote error: tls: handshake failure server=PeerServer remoteaddress=31.36.26.4:35462

It seems that the connection with your Peer have been defined to be secured by the TLS protocol. So, you may configure you Peer configuration to know wich certificates are you using at the TLS.
As when you connect to any server using this protocol, the communication among the parties is encripted using the certificate of the server (in this case, the Peer will be the server). So, you need to configure your client to trust on the server by the Root CA that haven been used to issue the Peers TLS certificates.
The client is where you use the SDK, so, you should configure it to trust on the Peer TLS certificate. When you configure the connection to the Blockchain nodes (peers and orderers), you would define their direction, as well as their TLS certificate. This one is an example that you can find at the following link. There, you have to define the value of the tlsCACerts param:
orderers:
orderer.example.com:
url: grpcs://localhost:7050
grpcOptions:
ssl-target-name-override: orderer.example.com
grpc-max-send-message-length: 4194304
tlsCACerts:
path: test/fixtures/channel/crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/tlscacerts/example.com-cert.pem
peers:
peer0.org1.example.com:
url: grpcs://localhost:7051
grpcOptions:
ssl-target-name-override: peer0.org1.example.com
grpc.keepalive_time_ms: 600000
tlsCACerts:
path: test/fixtures/channel/crypto-config/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/tlscacerts/org1.example.com-cert.pem
----- Edited ----
Also, you have to check the value of the ssl-target-name-override param. It should be the same to you nodes name, as you can see in the example file
----- Edited ----
Why are you using those IPs?! I understand that those IPs are internal from the Docker network, so you should not use them. Could you try using your containers names instead of the docker networks IPs?
----- Edited ----
Could you verify your ca-server configuration file and check that the tls is set to true?

You try a gRPC call in peer Server where peer server is secured with it's TLS system. So if you fail to provide the valid TLS certificate, server tls handshake will be failed and you will not get success to establish the connection.
Please check that your network config file is properly develop, also check that you are using the same TLS certificate that is used to run the peer server and your TLS certificate path is correct.

Network Security Group for Filezilla(client) connection

I am new here.
Few days ago, attended MS azure events, and today registered with Azure (free account).
VM Environment: VM = CentOS 7, apache+php+mysql+vsftpd+phpMyAdmin
everything is up and running, able to visit the "info.php" via its public IP address.
SeLinux = disabled, Firewalld disabled.
my problem is not able to connect this server via Filezilla (PC client).
from Windows command prompt (FTP/put) is working, able to upload files.
But via Filezilla
Status: Connecting to 5x.1xx.1xx.7x:21...
Status: Connection established, waiting for welcome message...
Status: Insecure server, it does not support FTP over TLS.
Status: Logged in
Status: Retrieving directory listing...
Command: PWD
Response: 257 "/home/ftpuser"
Command: TYPE I
Response: 200 Switching to Binary mode.
Command: PORT 192,168,1,183,234,99
Response: 200 PORT command successful. Consider using PASV.
Command: LIST
Error: Connection timed out after 20 seconds of inactivity
Error: Failed to retrieve directory listing
Status: Disconnected from server
Status: Connecting to 5x.1xx.1xx.7x:21...
Status: Connection established, waiting for welcome message...
Status: Insecure server, it does not support FTP over TLS.
Status: Logged in
Status: Retrieving directory listing...
Command: PWD
Response: 257 "/home/ftpuser"
Command: TYPE I
Response: 200 Switching to Binary mode.
Command: PORT 192,168,1,183,234,137
Response: 200 PORT command successful. Consider using PASV.
Command: LIST
Error: Connection timed out after 20 seconds of inactivity
Error: Failed to retrieve directory listing
I believe that is because of the Network Security group settings for inbound and outbound rules, need open some port, but not sure, because I tried open 1024-65535 all allow, still not working.

If you use passive mode FTP, you should open ports 20,21 and ports that you need on Azure NSG(Inbound rules). You could check /etc/vsftpd.conf
pasv_enable=YES
pasv_min_port=60001
pasv_max_port=60005
For this example, you should open ports 60001-60005 on Azure NSG(Inbound rules).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string