So I have cluster node on Google Kubernetes Engine and I do spark-submit to run some spark job. (I didn't use spark-submit exactly, I launch the submit using java code, but they are essentially invoking the same Scala class, which is SparkSubmit.class)
And in my case, I have two clusters I can connect with on my laptop by using the gcloud command.
e.g.
gcloud container clusters get-credentials cluster-1
gcloud container clusters get-credentials cluster-2
when I connect to cluster-1, and spark-submit is submitting to cluster-1, it works. But when I ran the second gcloud command and still submitting to cluster-1, it won't work, and the following stack track appears (abridged version)
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:194)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
I've been searching for a while without success. The main issue is probably when spark-submit launches, it searches for some sort of credential on the local machine relating to Kubernetes, and the changing context by previous two gcloud command messed it up.
I'm just curious, when we do spark-submit, how exactly does the remote K8s server knows who I am? What's the auth process involved in all this?
Thank you in advance.
A PKIX path building failed error means Java tries to open an SSL connection, but was unable to find a chain of certificates (the path) that validates the certificate the server offered.
The code you're running from does not trust the certificate offered by the cluster. The clusters are probably using self-signed certificates.
Run from the command line, Java looks for the chain in the truststore located at jre/lib/security/cacerts. Run as part of a larger environment (Tomcat, Glassfish, etc) it will use that environment's certificate truststore.
Since you started spark_submit manually, you're likely missing an option to specify where to find the keystore (server certificate and private key) and truststore (CA certificates). These are usually specified as:
-Djavax.net.ssl.trustStore=/somepath/truststore.jks
-Djavax.net.ssl.keyStore=/somepath/keystore.jks
If you're running on Java 9+, you will also need to specify the StoreType:
-Djavax.net.ssl.keyStoreType=<TYPE>
-Djavax.net.ssl.trustStoreType=<TYPE>
Up through Java 8, the keystores were always JKS. Since Java 9 they can also be PKCS12.
In the case of a self-signed key, you can export it from the keystore and import it into the truststore as a trusted certificate. There are several sites with instructions for how to do this. I find Jakob Jenkov's site to be quite readable.
If you want to see what the gcloud container clusters get-credentials cluster-1 command does you can start from scratch again and look at the content of ~/.kube/config
rm -rf ~/.kube
gcloud container clusters get-credentials cluster-1
cat ~/.kube/config
gcloud container clusters get-credentials cluster-2
cat ~/.kube/config
Something is probably not matching or conflicting. Or perhaps the user/contexts. Perhaps you have credentials for both cluster but you are using the context for cluster-1 to access cluster-2
$ kubectl config get-contexts
$ kubectl config get-clusters
The structure of the ~/.kube/config file should be something like this:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: <redacted> or file
server: https://<IP>:6443
name: cluster-1
- cluster:
certificate-authority: <redacted> or file
server: https://<IP>:8443
name: cluster-2
contexts:
- context:
cluster: cluster-1
user: youruser
name: access-to-cluster-1
- context:
cluster: cluster-2
user: youruser
name: access-to-cluster-2
current-context: access-to-cluster-1
kind: Config
preferences: {}
users:
- name: ....
user:
...
- name: ....
user:
...
In the code, it looks like it uses the io.fabric8.kubernetes.client.KubernetesClient library. For example, in this file KubernetesDriverBuilder.scala
Related
I'm running the CIS kube-bench tool on the master node and trying to resolve this error
[FAIL] 1.2.6 Ensure that the --kubelet-certificate-authority argument is set as appropriate (Automated).
I understand that I need to update the API server manifest YAML file with this flag pointing to the right CA file --kubelet-certificate-authority however, I'm not sure which one is the right CA Certififace for Kubelet.
These are my files in the PKI directory:-
apiserver-etcd-client.crt
apiserver-etcd-client.key
apiserver-kubelet-client.crt
apiserver-kubelet-client.key
apiserver.crt
apiserver.key
ca.crt
ca.key
etcd
front-proxy-ca.crt
front-proxy-ca.key
front-proxy-client.crt
front-proxy-client.key
sa.key
sa.pub
3 very similar discussions on the same topic. I wont provide you all steps cause it well written in documentation and related questions on stack. Only high-level overview
How Do I Properly Set --kubelet-certificate-authority apiserver parameter?
Kubernetes kubelet-certificate-authority on premise with kubespray causes certificate validation error for master node
Kubernetes kubelet-certificate-authority on premise with kubespray causes certificate validation error for master node
Your actions:
Follow the Kubernetes documentation and setup the TLS connection between the apiserver and kubelets.
These connections terminate at the kubelet's HTTPS endpoint. By
default, the apiserver does not verify the kubelet's serving
certificate, which makes the connection subject to man-in-the-middle
attacks and unsafe to run over untrusted and/or public networks.
Enable Kubelet authentication and Kubelet authorization
Then, edit the API server pod specification file /etc/kubernetes/manifests/kube-apiserver.yaml on the master node and set the --kubelet-certificate-authority parameter to the path to the cert file for the certificate authority.
From #Matt answer
Use /etc/kubernetes/ssl/ca.crt to sign new certificate for kubelet with valid IP SANs.
Set --kubelet-certificate-authority=/etc/kubernetes/ssl/ca.crt (valid CA).
In /var/lib/kubelet/config.yaml (kubelet config file) set tlsCertFile and tlsPrivateKeyFile to point to newly created kubelet crt and key files.
And from clarifications:
Yes you have to generate certificates for kubelets and sign sign them
the provided certificate authority located here on the master
/etc/kubernetes/ssl/ca.crt
By default in Kubernetes there are 3 different Parent CA (kubernetes-ca, etcd-ca, kubernetes-front-proxy-ca). You are looking for kubernetes-ca because kubelet using kubernetes-ca, and you can check the documentation. kubernetes-ca default path is /etc/kubernetes/pki/ca.crt But also you verify it via kubelet configmap with below commands
kubectl get configmap -n kube-system $(kubectl get configmaps -n kube-system | grep kubelet | awk '{print $1}') -o yaml | grep -i clientca
It is probably a stupid question, but I could not find anything useful on that topic.
I am following this tutorial to set up an automatic CI/CD pipeline:
https://rancher.com/blog/2018/2018-08-07-cicd-pipeline-k8s-autodevops-rancher-and-gitlab/
I get stuck on the Token part. I get this error:
unable to recognize "http://x.co/rm082018": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
It seems kubectl is not properly configured. If I call kubectl version I get the following output:
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T16:23:09Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?
It seems I would have to copy the admin.conf file into the home directory. However, this file does not exist since kubeadm is not installed on the rancher server. Later I tried installing kubeadm myself, calling kubeadm init and copying the resulting admin.conf file.
The error is still there.
So my question is:
how can I fix this?
do I have to fix this or can I get the token any other way?
Is the kubectl error normal behaviour since Rancher should handle all of this on its own?
Thanks in advance for any answers.
The kubectl command output indicates that no kubeconfig was found on your host. You have to do one of the following:
place a kubeconfig named config under ~/.kube/
export an environment variable named KUBECONFIG with the kubeconfig location as its value
use the kubectl command with the --kubeconfig ... argument
Happy hacking
Regards
I am trying to install a Kubernetes cluster on CentOS 7.3 servers. After some progress I got stuck on getting installing CNI plugin. To install plugin I need to pass a parameter which extracted from "kubectl version" command output. However command gets error when getting the required information, Server version:
[root#bigdev1 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Error from server (NotFound): the server could not find the requested resource
Actually I started using default documentation (https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/) with version kubeadm 1.7.3 (and Docker 17) but got stuck on a check:
[root#bigdev1 ~]# kubeadm init --pod-network-cidr=10.244.0.0/16
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.7.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.03.1-ce. Max validated version: 1.12
[preflight] Starting the kubelet service
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated CA certificate and key.
[certificates] Generated API server certificate and key.
[certificates] API Server serving cert is signed for DNS names [bigdev1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.109.20]
[certificates] Generated API server kubelet client certificate and key.
[certificates] Generated service account token signing key and public key.
[certificates] Generated front-proxy CA certificate and key.
[certificates] Generated front-proxy client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[apiclient] Created API client, waiting for the control plane to become ready
(waits here forever)
Then I decreased Docker version to 1.12.6 and kubernetes version to 1.6.0
After modifying kubeadm config. Also stopped passing cidr parameter to kubeadm init.
I will be glad if you can give any suggestions to get cleared of this issue or give the result of below command:
kubectl version | base64 | tr -d '\n'
Thanks in advance.
not sure which document your following. I would recommend using the kubeadm to configure the cluster.
https://kubernetes.io/docs/setup/independent/install-kubeadm/
This should give you the result of the command:
kubectl version 2>&1| base64 | tr -d '\n'
I'm attempting to test a single node dev cluster for openshift which I've created. I cannot run any basic commands on the cluster, because I haven't set up properly privliged accounts.
In particular I need to:
run pods which make containers which query service endpoints
query the apiserver through an insecure endpoint
run commands like kubectl get pods
Is there a default account somewhere I can use which can do all of these things? I'd prefer not to manually set up a bunch of complicated user accounts for a low-level development task such as this.
Below are a few, somewhat silly attempts I've made to do this, just as examples
First, I created an "admin" account like this:
sudo -u vagrant $oc login https://localhost:8443 -u=admin -p=admin --config=/data/src/github.com/openshift/origin/openshift.local.config/master/openshift-registry.kubeconfig
Then, I went ahead and hacked around in a few attempts to login as an admin:
[vagrant#localhost ~]$ sudo chmod 777 /openshift.local.config/master/admin.kubeconfig
[vagrant#localhost ~]$ oc login
Server [https://localhost:8443]:
The server uses a certificate signed by an unknown authority.
You can bypass the certificate check, but any data you send to the server could be intercepted by others.
Use insecure connections? (y/n): y
Authentication required for https://localhost:8443 (openshift)
Username: admin
Password: admin
Login successful.
Using project "project1".
[vagrant#localhost ~]$ oc get nodes --config=/openshift.local.config/master/admin.kubeconfig
This leads to the following error:
Error from server: User "admin" cannot list all nodes in the cluster
I also get this error leaving the config out:
[vagrant#localhost ~]$ oc get nodes
Error from server: User "admin" cannot list all nodes in the cluster
Is there any easy way to list nodes and do basic kube operations in a standalone development cluster for openshift?
You don't login when you are using administrative credentials. You simply set KUBECONFIG=admin.kubeconfig. Login is taking you through a different flow - there is no magic "admin" user.
I have followed this tutorial to setup Hadoop 2.2.0 multi-node cluster on Amazon EC2. I have had a number of issues with ssh and scp which i was either able to resolve or workaround with help of articles on Stackoverflow but unfortunately, i could not resolve the latest problem.
I am attaching the core configuration files core-site.xml, hdfs-site.xml etc. Also attaching a log file which is a dump output when i run the start-dfs.sh command. It is the final step for starting the cluster and it is giving a mix of errors and i don't have a clue what to do with them.
So i have 4 nodes exactly the same AMI is used. Ubuntu 12.04 64 bit t2.micro 8GB instances.
Namenode
SecondaryNode (SNN)
Slave1
Slave2
The configuration is almost the same as suggested in the tutorial mentioned above.
I have been able to connect with WinSCP and ssh from one instance to the other. Have copied all the configuration files, masters, slaves and .pem files for security purposes and the instances seem to be accessible from one another.
If someone could please look at the log, config files, .bashrc file and let me know what am i doing wrong.
Same security group HadoopEC2SecurityGroup is used for all the instances. All TCP traffic is allowed and ssh port is open. Screenshot in the zipped folder attached. I am able to ssh from Namenode to secondary namenode (SSN). Same goes for slaves as well which means that ssh is working but when i start the hdfs every thing goes down. The error log is not throwing any useful exceptions either. All the files and screenshots can be found as zipped folder here.
Excerpt from error output on console looks like
Starting namenodes on [OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
ec2-54-72-106-167.eu-west-1.compute.amazonaws.com]
You: ssh: Could not resolve hostname you: Name or service not known
have: ssh: Could not resolve hostname have: Name or service not known
loaded: ssh: Could not resolve hostname loaded: Name or service not known
VM: ssh: Could not resolve hostname vm: Name or service not known
library: ssh: Could not resolve hostname library: Name or service not known
Server: ssh: Could not resolve hostname server: Name or service not known
warning:: ssh: Could not resolve hostname warning:: Name or service not known
which: ssh: Could not resolve hostname which: Name or service not known
guard.: ssh: Could not resolve hostname guard.: Name or service not known
have: ssh: Could not resolve hostname have: Name or service not known
might: ssh: Could not resolve hostname might: Name or service not known
.....
Add the following entries to .bashrc where HADOOP_HOME is your hadoop folder:
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export
HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
Hadoop 2.2.0 : "name or service not known" Warning
hadoop 2.2.0 64-bit installing but cannot start