Istio Multicluster primary-remote on Azure AKS - azure

I am trying to create multi cluster istio primary-remote.
First created two clusters AZURE AKS. Used AzureCNI for Network Configuaration and following are the settings of the cluster.
First cluster
vnet istioclusterone - 10.10.0.0/20
subnet default - 10.10.0.0/20
k8s service address range 10.100.0.0/16
DNS service ip - 10.100.0.10
Docker Bridge address - 172.17.0.1/16
DNS-prefix - app-cluster-dns
Second cluster
vnet istioclusterone - 10.11.0.0/20
subnet default - 10.11.0.0/20
k8s service address range 10.101.0.0/16
DNS service ip - 10.101.0.10
Docker Bridge address - 172.18.0.1/16
DNS-prefix - processing-cluster-dns
Other than this gone with default settings.
Next Followed below articles to setup multi Istio cluster.
Before you begin
Primary-remote
last step in second article to setup cluster2 as remote is failed.
Found below errors in logs of istio-ingressgateway pod.
2022-04-11T07:51:00.352057Z warning envoy config StreamAggregatedResources gRPC config stream closed since 431s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:08.514428Z warning envoy config StreamAggregatedResources gRPC config stream closed since 439s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:12.462140Z warning envoy config StreamAggregatedResources gRPC config stream closed since 443s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
2022-04-11T07:51:39.950935Z warning envoy config StreamAggregatedResources gRPC config stream closed since 471s ago: 14, connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"cluster.local\")"
Has anyone tried this scenario please share your insights.
Thanks.
Update:
Have used custom certs for both the clusters previous error was solved.
then created a gateway in both the clusters.
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: cluster-aware-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: tls
protocol: TLS
tls:
mode: AUTO_PASSTHROUGH
hosts:
- "*.local"
Now getting new error. check below logs of pod istio-ingressgateway-575ccb4d79 of cluster2.
2022-04-13T09:14:04.650502Z warning envoy config StreamAggregatedResources gRPC config stream closed since 60s ago: 14, connection error: desc = "transport: Error while dialing dial tcp <publicIPofEastWestgateway>:15012: i/o timeout"
2022-04-13T09:14:27.026016Z warning envoy config StreamAggregatedResources gRPC config stream closed since 83s ago: 14, connection error: desc = "transport: Error while dialing dial tcp <publicIPofEastWestgateway:15012: i/o timeout"
what I undertood here, I have an eastwestgateway installed in cluster1 as in the documentation linkToDoc
cluster2 is trying to access cluster1. using publicIp of eastwest-gateway on port 15012 which is failing.
checked security groups port is opened. Tried telnet from a test pod from within the cluster to check. its failing.
can anyone help me here.
Thanks

It looks like a firewall issue. not sure if it'll help, but try opening the ports 15012 and 15443 on the remote cluster's outbound, to the eastwestgateway elb ip (primary cluster)

Related

pulsar-admin sink localrun failure: ClientCnx Failed to authenticate the client. (tls auth enabled)

command:
bin/pulsar-admin sinks localrun -a connectors/pulsar-io-mongo-2.10.2.nar --tenant public --namespace default --inputs up-20wt --name mongo-sink --sink-config-file work/config.json --parallelism 1
log:
2023-01-06T14:56:27,441+0800 [pulsar-client-io-1-2] WARN org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xb0292161, L:/127.0.0.1:57408 - R:localhost/127.0.0.1:6650]] Connection handshake failed: org.apache.pulsar.client.api.PulsarClientException$AuthenticationException: Unable to authenticate
ERROR org.apache.pulsar.client.impl.ClientCnx - [id: 0xb0292161, L:/127.0.0.1:57408 ! R:localhost/127.0.0.1:6650] Failed to authenticate the client
INFO org.apache.pulsar.client.impl.PulsarClientImpl - Client closing. URL: pulsar://localhost:6650
The deployment uses tls mutual auth: https://pulsar.apache.org/docs/2.10.x/security-tls-authentication
Both client.conf and broker.conf have enabled tls and related key files are setup correspondently.
And most importantly, my pulsar client(c++) can connect to the pulsar+ssl broker and pub message successfully.
so:
why the mongo-io-sink created with pulsar-admin connect to the 6650 broker binary data port instead of the ssl 6651 port?
how to fix this sink creating issue?

Strimzi Kafka Listener Custom Cert configuration

I am trying to configure Strimzi Kafka listener custom cert, following the documentation: https://strimzi.io/docs/operators/latest/full/configuring.html#ref-alternative-subjects-certs-for-listeners-str
I want to expose those listener outside of the Azure Kubernetes Service within the private virtual network.
I have provided a custom cert with private key generated by an internal CA and pointed towards that secret in the Kafka configuration:
kubectl create secret generic kafka-tls --from-literal=listener.cer=$cert --from-literal=listener.key=$skey -n kafka
`
listeners:
- name: external
port: 9094
type: loadbalancer
tls: true
authentication:
type: tls
#Listener TLS config
configuration:
brokerCertChainAndKey:
secretName: kafka-tls
certificate: listener.cer
key: listener.key
bootstrap:
loadBalancerIP: 10.67.249.253
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
brokers:
- broker: 0
loadBalancerIP: 10.67.249.251
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
- broker: 1
loadBalancerIP: 10.67.249.252
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
- broker: 2
loadBalancerIP: 10.67.249.250
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
authorization:
type: simple
`
Certificate has following records:
SAN:
*.kafka-datalake-prod-kafka-brokers *.kafka-datalake-prod-kafka-brokers.kafka.svc kafka-datalake-prod-kafka-bootstrap kafka-datalake-prod-kafka-bootstrap.kafka.svc kafka-datalake-prod-kafka-external-bootstrap kafka-datalake-prod-kafka-external-bootstrap.kafka.svc kafka-datalake-prod-azure.custom.domain
CN=kafka-datalake-produkty-prod-azure.custom.domain
I have also created an A record in the custom DNS for the given address: kafka-datalake-produkty-prod-azure.custom.domain 10.67.249.253
Then, I created a KafkaUser object:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
name: customuser
namespace: kafka
labels:
strimzi.io/cluster: kafka-datalake-prod
spec:
authentication:
type: tls
authorization:
type: simple
acls:
- resource:
type: topic
name: notify.somecustomapp.prod.topic_name
patternType: literal
operations:
- Create
- Describe
- Write
# host: "*"
When I then retrieve the secrets from the Kafka cluster on AKS:
kubectl get secret kafka-datalake-prod-cluster-ca-cert -n kafka -o jsonpath='{.data.ca\.crt}' | base64 -d > broker.crt kubectl get secret customuser -n kafka -o jsonpath='{.data.user\.key}' | base64 -d > customuser.key kubectl get secret customuser -n kafka -o jsonpath='{.data.user\.crt}' | base64 -d > customuser.crt
Communication fails, when I try to connect and send some messages with a producer using those 3 files to authenticate/authorize, I get a following issue:
INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <connecting> [IPv4 ('10.67.249.253', 9094)]>: connecting to 10.67.249.253:9094 [('10.67.249.253', 9094) IPv4] INFO:kafka.conn:Probing node bootstrap-0 broker version INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <handshake> [IPv4 ('10.67.249.253', 9094)]>: Loading SSL CA from certs/prod/broker.crt INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <handshake> [IPv4 ('10.67.249.253', 9094)]>: Loading SSL Cert from certs/prod/customuser.crt INFO:kafka.conn:<BrokerConnection node_id=bootstrap-0 host=10.67.249.253:9094 <handshake> [IPv4 ('10.67.249.253', 9094)]>: Loading SSL Key from certs/prod/customuser.key [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)
What am I doing wrong?
The communication worked perfectly fine when I was using the same method of connecting, yet the cluster itself and listeners were using the default certs generated by Strimzi cluster.
All the best,
Krzysztof
#Turing85 #Jakub
Many thanks for your comments - especially those critical ones
And thanks, Jakub, for pointing me towards using the CA of custom certificate. What needed to be done in order to fix this was:
switch the value obtained from kafka-datalake-prod-cluster-ca-cert secret with the full chain of root CA, intermediate signing cert and the certificate itself.
Add LoadBalancer IPs of brokers - this is stated in the documentation, yet the way it is formulated misguided me into thinking that adding hostnames/service names to SAN is enough (https://strimzi.io/docs/operators/latest/full/configuring.html#tls_listener_san_examples, and later https://strimzi.io/docs/operators/latest/full/configuring.html#external_listener_san_examples).
After those changes, everything started to work.
Thank you for help.

rafthttp: dial tcp timeout on etcd 3-node cluster creation

I don't have an access to the etcd part of the project's source code, however I do have access to the /var/log/syslog.
The goal is to setup up 3-node cluster.
(1)The very first etcd error that comes up is:
rafthttp: failed to dial 76e7ffhh20007a98 on stream MsgApp v2 (dial tcp 10.0.0.134:2380: i/o timeout)
Before continuing, I would say that I can ping all three nodes from each of the nodes. As well as I have tried to open the 2380 TCP ports and still no success - same error.
(2)So, before that error I had following messages from the etcd, which in my opinion confirm that cluster is setup correctly:
etcdserver/membership: added member 76e7ffhh20007a98 [https://server2:2380]
etcdserver/membership: added member 222e88db3803e816 [https://server1:2380]
etcdserver/membership: added member 999115e00e17123d [https://server3:2380]
In /etc/hosts file these DNS names are resolved as:
server2 10.0.0.135
server1 10.0.0.134
server3 10.0.0.136
(3)The initial setup, however (on each nodes looks like this):
embed: listening for peers on https://127.0.0.1:2380
embed: listening for client requests on 127.0.0.1:2379
So, to sum up, each node have got this initial setup log (3) and then adds members (2) then once these steps are done it fails with (1). As I know the etcd cluster creation is following this pattern: https://etcd.io/docs/v3.5/tutorials/how-to-setup-cluster/
Without knowing the source code is really hard to debug, however maybe some ideas on the error and what could cause it?
UPD: etcdctl cluster-health output (ETCDCTL_ENDPOINT is exported):
cluster may be unhealthy: failed to list members Error: client: etcd
cluster is unavailable or misconfigured; error #0: client: endpoint
http://127.0.0.1:2379 exceeded header timeout ; error #1: dial tcp
127.0.0.1:4001: connect: connection refused
error #0: client: endpoint http://127.0.0.1:2379 exceeded header
timeout error #1: dial tcp 127.0.0.1:4001: connect: connection refused

Can’t setup yb-master with TLS encryption in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
I am trying to setup yb-master nodes with TLS encryption. I followed the doc and applied the configurations accordingly. I am receiving the following error:
W0113 18:45:22.847903 28992 universe_key_client.cc:60] Rpc status: Network error (yb/rpc/secure_stream.cc:562): Handshake failed: Network error (yb/rpc/secure_stream.cc:882): Endpoint does not match, address: ip2, hostname: ip2, resp:
W0113 18:45:22.851953 28981 tcp_stream.cc:144] { local: ip1:7100 remote: ip3:59128 }: Shutting down with pending inbound data ({ capacity: 374400 pos: 0 size: 67 }, status = Network error (yb/rpc/secure_stream.cc:472): Insecure connection header: 5942)
W0113 18:45:22.852005 28981 tcp_stream.cc:144] { local: ip1:7100 remote: ip3:59128 }: Shutting down with pending inbound data ({ capacity: 374400 pos: 0 size: 67 }, status = Service unavailable (yb/rpc/reactor.cc:100): Shutdown connection (system error 108))
From what I see, the endpoints actually match. Anyone has idea on this?
Do you have that ip address in your cert? Do you connect over that ip? You need to have the ip address in the cert. It should work after updating the certs.

Debugging TLS handshake failure

I'm trying to access my peer through the fabric-network nodejs sdk.
However, I encounter an error during the gateway.connect in the sdk and the logs I find in the peer container is not helpful.
All I have, even with the grpc=debug logging mode is :
peer0.catie-test | 2020-09-21 13:27:07.731 UTC [core.comm] ServerHandshake -> ERRO 087 TLS handshake failed with error remote error: tls: handshake failure server=PeerServer remoteaddress=172.17.0.1:49918
peer0.catie-test | 2020-09-21 13:27:07.731 UTC [grpc] handleRawConn -> DEBU 088 grpc: Server.Serve failed to complete security handshake from "172.17.0.1:49918": remote error: tls: handshake failure
Any way to have more helpful logs ? I would like to know, for example, which key are used for the TLS handshake check.
Edit with more infos: Configuration files and TLS verification
My peer is configured with TLS with the env variables :
CORE_PEER_TLS_ENABLED=true
CORE_PEER_TLS_KEY_FILE=/etc/hyperledger/crypto/peer/tls-msp/keystore/key.pem
CORE_PEER_TLS_CERT_FILE=/etc/hyperledger/crypto/peer/tls-msp/signcerts/cert.pem
CORE_PEER_TLS_ROOTCERT_FILE=/etc/hyperledger/crypto/peer/tls-msp/tlscacerts/tlsca.catie-test-cert.pem
I have the correct tlscacert of my peer on the client side, because the output from the peer and from client side are the same:
cat /etc/hyperledger/crypto/peer/tls-msp/tlscacerts/tlsca.catie-test-cert.pem # From the peer, output ZTd/o8LLw== at the end
cat /tmp/fabric-start-catie-test/building/artifacts/peer0.catie-test-crypto/tls-msp/tlscacerts/tlsca.catie-test-cert.pem # From the client, output ZTd/o8LLw== at the end
Path to the peer tlscacert is filled in the client side connection-profile.json :
"peers": {
"peer0.catie-test": {
"tlsCACerts": {
"path": "/tmp/fabric-start-catie-test/building/artifacts/peer0.catie-test-crypto/tls-msp/tlscacerts/tlsca.catie-test-cert.pem"
},
"grpcOptions":{
"ssl-target-name-override": "172.17.0.7",
"grpc.keepalive_time_ms": 10000
},
"url": "grpcs://172.17.0.4:7051",
"eventUrl": "grpcs://172.17.0.4:7053"
}
}
And I also checked that the tlsCAcert is the one that generated my peer cert :
openssl verify -CAfile $CORE_PEER_TLS_ROOTCERT_FILE $CORE_PEER_TLS_CERT_FILE # Output : /etc/hyperledger/crypto/peer/tls-msp/signcerts/cert.pem: OK
Edit 2 : Grpc option, peer name instead of IP and client logs
Also tried adding the grpcOptions to the peer section of the connection-profile.json (see the updated paragraph above) but it didn't change anything.
Also tried to add peer name to my /etc/hosts to reach my peer via its name instead of its IP. It makes a warning disappear but don't solve my problem and I prefer to work with IPs in my scripts.
Here are the logs of the nodejs sdk client in case it helps to diagnose the problem, but it only says that the Endorser must be connected and I think it is, because it reaches my peer as I have this TLS error in my peer's logs.
(node:59350) [DEP0123] DeprecationWarning: Setting the TLS ServerName to an IP address is not permitted by RFC 6066. This will be ignored in a future version.
2020-09-23T06:42:20.704Z - error: [ServiceEndpoint]: Error: Failed to connect before the deadline on Endorser- name: peer0.catie-test, url:grpcs://172.17.0.7:7051, connected:false, connectAttempted:true
2020-09-23T06:42:20.705Z - error: [ServiceEndpoint]: waitForReady - Failed to connect to remote gRPC server peer0.catie-test url:grpcs://172.17.0.7:7051 timeout:3000
2020-09-23T06:42:20.708Z - error: [NetworkConfig]: buildPeer - Unable to connect to the endorser peer0.catie-test due to Error: Failed to connect before the deadline on Endorser- name: peer0.catie-test, url:grpcs://172.17.0.7:7051, connected:false, connectAttempted:true
at checkState (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/#grpc/grpc-js/build/src/client.js:69:26)
at Timeout._onTimeout (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/#grpc/grpc-js/build/src/channel.js:292:17)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7) {
connectFailed: true
}
(node:59350) UnhandledPromiseRejectionWarning: Error: Endorser must be connected
at Channel.addEndorser (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/fabric-common/lib/Channel.js:259:10)
at buildChannel (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/fabric-network/lib/impl/ccp/networkconfig.js:50:21)
at Object.loadFromConfig (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/fabric-network/lib/impl/ccp/networkconfig.js:34:19)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async Gateway.connect (/home/rqueraud/CATIE/Myrmica/myrmica-start/node_modules/fabric-network/lib/gateway.js:279:13)
at async queryChaincode (/home/rqueraud/CATIE/Myrmica/myrmica-start/test/chaincode-sdk/index.js:41:5)
at async /home/rqueraud/CATIE/Myrmica/myrmica-start/test/chaincode-sdk/index.js:57:5
Edit 3 : Docker IPs ? Trying with EC2 instances.
As #Urko mentionned, my nodes are in fact docker container running docker-in-docker (dind) images. Inside these containers are some other containers running the hyperledger peers, cli, ... images.
I access them from the host which is also where I run the fabric sdk nodejs client. I cannot access them via their container name, I think it is only possible in a docker-compose configuration, isn't it ? I already tried (see Edit 2 above) to add their name to my /etc/hosts to reach them via a name instead of an IP but it didn't change anything.
However, as my network startup is scripted, I deployed it using docker-machine in AWS instead of the dind docker containers this time, so these are real instances reachable on the internet. But I still encounter the same errors, here is the log from the peer where you can see this is coming from a public IP :
2020-09-24 08:32:57.653 UTC [core.comm] ServerHandshake -> ERRO 0d7 TLS handshake failed with error remote error: tls: handshake failure server=PeerServer remoteaddress=31.36.26.4:35462
It seems that the connection with your Peer have been defined to be secured by the TLS protocol. So, you may configure you Peer configuration to know wich certificates are you using at the TLS.
As when you connect to any server using this protocol, the communication among the parties is encripted using the certificate of the server (in this case, the Peer will be the server). So, you need to configure your client to trust on the server by the Root CA that haven been used to issue the Peers TLS certificates.
The client is where you use the SDK, so, you should configure it to trust on the Peer TLS certificate. When you configure the connection to the Blockchain nodes (peers and orderers), you would define their direction, as well as their TLS certificate. This one is an example that you can find at the following link. There, you have to define the value of the tlsCACerts param:
orderers:
orderer.example.com:
url: grpcs://localhost:7050
grpcOptions:
ssl-target-name-override: orderer.example.com
grpc-max-send-message-length: 4194304
tlsCACerts:
path: test/fixtures/channel/crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/tlscacerts/example.com-cert.pem
peers:
peer0.org1.example.com:
url: grpcs://localhost:7051
grpcOptions:
ssl-target-name-override: peer0.org1.example.com
grpc.keepalive_time_ms: 600000
tlsCACerts:
path: test/fixtures/channel/crypto-config/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/tlscacerts/org1.example.com-cert.pem
----- Edited ----
Also, you have to check the value of the ssl-target-name-override param. It should be the same to you nodes name, as you can see in the example file
----- Edited ----
Why are you using those IPs?! I understand that those IPs are internal from the Docker network, so you should not use them. Could you try using your containers names instead of the docker networks IPs?
----- Edited ----
Could you verify your ca-server configuration file and check that the tls is set to true?
You try a gRPC call in peer Server where peer server is secured with it's TLS system. So if you fail to provide the valid TLS certificate, server tls handshake will be failed and you will not get success to establish the connection.
Please check that your network config file is properly develop, also check that you are using the same TLS certificate that is used to run the peer server and your TLS certificate path is correct.

Resources