I am running Cassandra on Kubernetes (3 instances) and want to expose it to the outside, my application is not yet in Kubernetes. So i crated a load balanced service like so:
apiVersion: v1
kind: Service
metadata:
namespace: getquanty
labels:
app: cassandra
name: cassandra
annotations:
kubernetes.io/tls-acme: "true"
spec:
clusterIP:
ports:
- port: 9042
name: cql
nodePort: 30001
- port: 7000
name: intra-node
nodePort: 30002
- port: 7001
name: tls-intra-node
nodePort: 30003
- port: 7199
name: jmx
nodePort: 30004
selector:
app: cassandra
type: LoadBalancer
This is the result is:
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cassandra 10.55.249.88 GIVEN_IP_GCE_LB 9042:30001/TCP,7000:30002/TCP,7001:30003/TCP,7199:30004/TCP 26m
I am able to connect using sh (cqlsh GIVEN_IP_GCE_LB ) but when i try to add data to Cassandra using the datastax driver for node, i got this:
message: 'Cannot achieve consistency level SERIAL',
info: 'Represents an error message from the server',
code: 4096,
consistencies: 8,
required: 1,
alive: 0,
coordinator: '35.187.166.68:9042' },
'10.52.4.32:9042': 'Host considered as DOWN',
'10.52.2.15:9042': 'Host considered as DOWN' },
info: 'Represents an error when a query cannot be performed because no host is available or could be reached by the driver.',
message: 'All host(s) tried for query failed. First host tried, 35.187.166.68:9042: ResponseError: Cannot achieve consistency level SERIAL. See innerErrors.' }
My first though was I need to expose the other ports too, so I did (intra-node, tls-intra-node, jmx), but it was the same error.
Kubernetes gives you access to proxy, i tried to proxy from my machine using the constructed URL for the pod to test if i have access but i cannot connect using cqlsh:
http://127.0.0.1:8001/api/v1/namespaces/qq/pods/cassandra-0:cql/proxy
I am out of ideas, the one thing left to try is to expose every instance (make a service for every instance) which is very ugly, but it will let me connect to the nodes from the outside until i migrate the application to Kubernetes.
Does any one have ideas how to expose Cassandra nodes to the internet and make the Datastax driver aware of all the nodes? Thank you for your time.
After more reading I found out that the replication strategy was the one causing the problem, NetworkStrategy is suitable for multi-cluster, I have one, so I changed the replication to simple with the number of nodes i had, now every thing works as expected.
EDIT 1:
Putting databases on Kube is not a good solution, I ended up making a standalone cluster, added it to the same Network as kube, and was able to access it from kube pods.
Kube is made to manage application and make them 'elastic', i don't think people really need to scale databases as quick as applications, furthermore, the scaling of a database is not the same operation as a stateless application.
You need to use headless service for the replication controller you created.
Your service should be something like :
apiVersion: v1
kind: Service
metadata:
labels:
app: cassandra
name: cassandra
spec:
clusterIP: None
ports:
- port: 9042
selector:
app: cassandra
Also, you can take reference for the below link and bring up a cassandra cluster.
https://github.com/kubernetes/kubernetes/tree/master/examples/storage/cassandra
I would recommend to run cassandra pod via replication controller or statefulset or daemonset because then kubernetes manages restart/rescheduling of the pod whenever required.
Related
I am using Azure Kubernetes, and trying to set TCP_Keepalive on a container basis.
Is there away of achieving that?
You could do this via sysctls on the pod manifest in AKS/Kubernetes:
spec:
securityContext:
sysctls:
- name: "net.ipv4.tcp_keepalive_time"
value: "45"
Here is also further documentation:
https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/
https://docs.syseleven.de/metakube/de/tutorials/confiugre-unsafe-sysctls
I am trying to use spark on Kubernetes cluster (existing setup: emr + yarn). Our use case is to handle too many jobs including short lived ones (few seconds to 15 minutes). Also, we have peak hours when many workers needs to run to handle 100s of jobs running concurrently.
So what I want to achieve, running master and fixed few workers (say 5) all time and increase workers to 40-50 at peak time. Also, I will prefer to use dynamic allocation.
I am setting it as below
Master image (spark-master:X)
FROM <BASE spark 3.1 Image build using dev/make-distribution.sh -Pkubernetes in spark>
ENTRYPOINT ["/opt/spark/sbin/start-master.sh", "-p", "8081", "<A long running server command that can accept get traffic on 8080 to submit jobs>"]
Worker worker image (spark-worker:X)
FROM <BASE spark 3.1 Image build using dev/make-distribution.sh -Pkubernetes in spark>
ENTRYPOINT ["/opt/spark/sbin/start-worker.sh", "spark//spark-master:8081" ,"-p", "8081", "<A long running server command to keep up the worker>"]
Deplyments
apiVersion: apps/v1
kind: Deployment
metadata:
name: spark-master-server
spec:
replicas: 1
selector:
matchLabels:
component: spark-master-server
template:
metadata:
labels:
component: spark-master-server
spec:
containers:
- name: spark-master-server
image: spark-master:X
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8081
---
apiVersion: v1
kind: Service
metadata:
name: spark-master
spec:
type: ClusterIP
ports:
- port: 8081
targetPort: 8081
selector:
component: spark-master-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: spark-worker-instance
spec:
replicas: 3
selector:
matchLabels:
component: spark-worker-instance
template:
metadata:
labels:
component: spark-worker-instance
spec:
containers:
- name: spark-worker-server
image: spark-worker:X
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8081
Questions
Is this setup recommended ?
How can we submit new job from within Kubernetes cluster in absence of yarn (k8s) ?
Reason we are trying not create master and driver dynamically per job (as given in example - http://spark.apache.org/docs/latest/running-on-kubernetes.html) is that it may be an overhead for large no. of small jobs.
Is this setup recommended ?
Don't think so.
Dynamic Resource Allocation is a property of a single Spark application "to dynamically adjust the resources your application occupies based on the workload."
Dynamic Resource Allocation spans its resource requirements regardless of available nodes in a cluster. As long as there are resources available and a cluster manager could assign them to a Spark application these resources are free to go.
What you seem to be trying to set up is how to scale the cluster itself up and down. In your case it's Spark Standalone and although technically it's possible with ReplicaSets (just a guess) I've never heard any earlier attempts at it. You're on your own as Spark Standalone does not support it out of the box.
That I think is an overkill since you're building a multi-layer cluster environment: using a cluster manager (Kubernetes) to host another cluster manager (Spark Standalone) for Spark applications. Given Spark on Kubernetes supports Dynamic Allocation out of the box the only worry of yours should simply be how to "throw in" more CPU and memory on demand while resizing Kubernetes cluster. You should rely on the capabilities of Kubernetes to resize itself up and down rather than Spark Standalone on Kubernetes.
The spark on k8s operator may provide at least one mechanism to dynamically provision the resources you need to do safe scaling of resources based on demand.
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
My thinking is that instead of performing a direct spark submit to a master in a static spark cluster, you could make a call to the k8s API to provision the required instance; or otherwise define these as a cron schedule.
In the dynamic provisioning scenario one thing to consider is how your workloads get distributed across the cluster; we can’t use simple HPA rules for this as it may not be safe to tear down a worker on CPU/Mem levels; spawning a separate cluster for each on demand workload bypasses this but may not be optimal. I would be interested to hear how you get on.
We are using Spark & Cassandra in an application which is deployed on bare metal/VM. To connect Spark to Cassandra, we are using following properties in order to enable SSL :
spark.cassandra.connection.ssl.keyStore.password
spark.cassandra.connection.ssl.keyStore.type
spark.cassandra.connection.ssl.protocol
spark.cassandra.connection.ssl.trustStore.path
spark.cassandra.connection.ssl.trustStore.password
spark.cassandra.connection.ssl.trustStore.type
spark.cassandra.connection.ssl.clientAuth.enabled
Now I am trying to migrate same application in Kubernetes. I have following questions :
Do I need to change above properties in order to connect spark to Cassandra cluster in Kubernetes?
Does above properties will work or did I miss something ?
Can anyone point to some document or link which can help me ?
Yes, these properties will continue to work when you run your job on Kubernetes. The only thing that you need to take into account is that all properties with name ending with .path need to point to the actual files with trust & key stores. On Kubernetes, you need to take care of exposing them as secrets, mounted as files. First you need to create a secret, like this:
apiVersion: v1
data:
spark.truststore: base64-encoded truststore
kind: Secret
metadata:
name: spark-truststore
type: Opaque
and then in the spec, point to it:
spec:
containers:
- image: nginx
name: nginx
volumeMounts:
- mountPath: "/some/path"
name: spark-truststore
readOnly: true
volumes:
- name: spark-truststore
secret:
secretName: spark-truststore
and point configuration option to given path, like: /some/path/spark.truststore
I'm a noob with Azure deployment, kubernetes and HA implementation. When I implement health probes as part of my app deployment, the health probes fail and I end up with either 503 (internal server error) or 502 (bad gateway) error when I try accessing the app via the URL. When I remove the health probes, I can successfully access the app using its URL.
I use the following yaml deployment configuration when implementing the health probes, which is utilised by an Azure devops pipeline. The app takes under 5 mins to become available, so I set the initialDelaySeconds for the health probes to 300s.
apiVersion: apps/v1
kind: Deployment
metadata:
name: myApp
spec:
...
template:
metadata:
labels:
app: myApp
spec:
...
containers:
- name: myApp
...
ports:
- containerPort: 5000
...
readinessProbe:
tcpSocket:
port: 5000
initialDelaySeconds: 300
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
livenessProbe:
tcpSocket:
port: 5000
periodSeconds: 30
initialDelaySeconds: 300
successThreshold: 1
failureThreshold: 3
...
When I perform the deployment and describe the pod, I see the following listed under 'Events' at the bottom of the output:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 2m1s (x288 over 86m) kubelet, aks-vm-id-appears-here Readiness probe failed: dial tcp 10.123.1.23:5000: connect: connection refused
(this is confusing as it states the age as 2m1s - but the initialDelaySeconds is greater than this - so I'm not sure why it reports this as the age)
The readiness probe subsequently fails with the same error. The IP number matches the IP of my pod and I see this under Containers in the pod description:
Containers:
....
Port: 5000/TCP
The failure of the liveness and readiness probes results in the pod being continually terminated and restarted.
The app has a default index.html page, so I believe the health probe should receive a 200 response if it's able to connect.
Because the health probe is failing, the pod IP doesn't get assigned to the endpoints object and therefore isn't assigned against the service.
If I comment out the readinessProbe and livenessProbe from the deployment, the app runs successfully when I use the URL via the browser, and the pod IP gets successfully assigned as an endpoint that the service can communicate with. The endpoint address is in the form 10.123.1.23:5000 - i.e. port 5000 seems to be the correct port for the pod.
I don't understand why the health probe would be failing to connect? It looks correct to me that it should be trying to connect on an IP that looks like 10.123.1.23:5000.
It's possible that the port is taking a long time than 300s to become open, but I don't know of a way I can check that. If I enter a bash session on the pod, watch isn't available (I read that watch ss -lnt can be used to examine port availability).
The following answer suggests increasing initialDelaySeconds but I already tried that - https://stackoverflow.com/a/51932875/1549918
I saw this question - but resource utilisation (e.g. CPU/RAM) is not the issue
Liveness and readiness probe connection refused
UPDATE
If I curl from a replica of the pod to https://10.123.1.23:5000, I get a similar error (Failed to connect to ...the IP.. port 5000: Connection refused). Why could this be failing? I read something that suggests that attempting this connection from another pod may indicate reachability for the health probes also.
If you are unsure if your application is starting correctly then replace it with a known good image. e.g. httpd
change the ports to 80, the image to httpd.
You might also want to increase the timeout for the health check as it defaults to 1 second to timeoutSeconds=5
in addition, if your image is a web application then it would be better to use a http probe
Your statement
The app has a default index.html page, so I believe the health probe should receive a 200 response if it's able to connect.
is incorrect.
You are doing a tcpSocket check. Try to switch to:
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: 5000
scheme: HTTP
I have a nodejs microservice running on GKE that serves html/js assets on our clients' sites. The configuration for this is as follows: Ingress > NodePort Service > Pods (4 replicas). If for some season the pods are failing or the Node goes down, the microservice will then take the full 30s to timeout. That causes delayed page load times for our clients. What I need to happen is in the event of a failure that the NodePort service cuts off the connection or responds with a 502 error after 2 seconds.
I've tried two ways of manipulating the same setting. The first is creating a BackendConfig following the docs: https://cloud.google.com/kubernetes-engine/docs/how-to/configure-backend-service
My config looks like this:
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: timeout-config
spec:
timeoutSec: 2
Then I connected it to my service like this:
apiVersion: v1
kind: Service
metadata:
annotations:
beta.cloud.google.com/backend-config: '{"ports": {"80":"timeout-config"}}'
labels:
run: <MICROSERVICE>
name: <MICROSERVICE>
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 3000
selector:
run: <MICROSERVICE>
type: NodePort
I tested this configuration by having my microservice alternate between returning 200 and 502 for the health check endpoint every 30 seconds. That caused the pod to be restarted about every 30s which would cut off communication with the pod. I expected that once it was being restarted that the request would timeout and default to the 2-second setting I had configured. However, it still took 30 seconds to receive the 502 error.
The second method I tried was to set the timeout to 2 seconds using gcloud. I did so by following the docs here: https://github.com/kubernetes/ingress-gce/blob/e72479ba461fedae5fc5bf64999f28ba3125004d/examples/websocket/README.md#change-backend-timeout
That method did not work either. What other methods can I use to get my service to timeout after 2 seconds on GKE?