Spark UI History server on Kubernetes? - apache-spark

With spark-submit I launch application on a Kubernetes cluster. And I can see Spark-UI only when I go to the http://driver-pod:port.
How can I start Spark-UI History Server on a cluster?
How to make, that all running spark jobs are registered on the Spark-UI History Server.
Is this possible?

Yes it is possible. Briefly you will need to ensure following:
Make sure all your applications store event logs in a specific location (filesystem, s3, hdfs etc).
Deploy the history server in your cluster with access to above event logs location.
Now spark (by default) only read from the filesystem path so I will elaborate this case in details with spark operator:
Create a PVC with a volume type that supports ReadWriteMany mode. For example NFS volume. The following snippet assumes you have storage class for NFS (nfs-volume) already configured:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: spark-pvc
namespace: spark-apps
spec:
accessModes:
- ReadWriteMany
volumeMode: Filesystem
resources:
requests:
storage: 5Gi
storageClassName: nfs-volume
Make sure all your spark applications have event logging enabled and at the correct path:
sparkConf:
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "file:/mnt"
With event logs volume mounted to each application (you can also use operator mutating web hook to centralize it ) pod. An example manifest with mentioned config is show below:
---
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-java-pi
namespace: spark-apps
spec:
type: Java
mode: cluster
image: gcr.io/spark-operator/spark:v2.4.4
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar"
imagePullPolicy: Always
sparkVersion: 2.4.4
sparkConf:
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "file:/mnt"
restartPolicy:
type: Never
volumes:
- name: spark-data
persistentVolumeClaim:
claimName: spark-pvc
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 2.4.4
serviceAccount: spark
volumeMounts:
- name: spark-data
mountPath: /mnt
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 2.4.4
volumeMounts:
- name: spark-data
mountPath: /mnt
Install spark history server mounting the shared volume. Then you will have access events in history server UI:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: spark-history-server
namespace: spark-apps
spec:
replicas: 1
template:
metadata:
name: spark-history-server
labels:
app: spark-history-server
spec:
containers:
- name: spark-history-server
image: gcr.io/spark-operator/spark:v2.4.0
resources:
requests:
memory: "512Mi"
cpu: "100m"
command:
- /sbin/tini
- -s
- --
- /opt/spark/bin/spark-class
- -Dspark.history.fs.logDirectory=/data/
- org.apache.spark.deploy.history.HistoryServer
ports:
- name: http
protocol: TCP
containerPort: 18080
readinessProbe:
timeoutSeconds: 4
httpGet:
path: /
port: http
livenessProbe:
timeoutSeconds: 4
httpGet:
path: /
port: http
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: spark-pvc
readOnly: true
Feel free to configure Ingress, Service for accessing the UI.
Also you can use Google Cloud Storage, Azrue Blob Storage or AWS S3 as event log location. For this you will need to install some extra jars so I would recommend having a look at lightbend spark history server image and charts.

Related

Memory Sev3 alerts in Azure kubernetes cluster with default resource allocations for 2 pods - mySQL db and Spring app

I deployed few days ago 2 services into Azure Kubernetes cluster. I set up cluster with 1 node, virtual machine parameters: B2s: 2 Cores, 4 GB RAM, 8 GB Temporary storage. Then I placed 2 pods on the same node:
MySQL database with 4Gib storage persistent volume, 5 tables at the moment
Spring boot java application
There is no replicas.
Take a look on kubectl output regarding the deployed pods:
The purpose is to create internal application in company where I work which will be used by company team. There won't be a lot of data in DB.
When we started to test connection with API from front-end I received memory alert like below:
mySQL deployment yaml file looks like:
apiVersion: v1
kind: Service
metadata:
name: mysql-db-testing-service
namespace: testing
spec:
type: LoadBalancer
ports:
- port: 3307
targetPort: 3306
selector:
app: mysql-db-testing
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql-db-testing
namespace: testing
spec:
selector:
matchLabels:
app: mysql-db-testing
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: mysql-db-testing
spec:
containers:
- name: mysql-db-container-testing
image: mysql:8.0.31
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysqldb-secret-testing
key: password
ports:
- containerPort: 3306
name: mysql-port
volumeMounts:
- mountPath: "/var/lib/mysql"
name: mysql-persistent-storage
volumes:
- name: mysql-persistent-storage
persistentVolumeClaim:
claimName: azure-managed-disk-pvc-mysql-testing
nodeSelector:
env: preprod
Spring app deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: spring-app-api-testing
namespace: testing
labels:
app: spring-app-api-testing
spec:
replicas: 1
selector:
matchLabels:
app: spring-app-api-testing
template:
metadata:
labels:
app: spring-app-api-testing
spec:
containers:
- name: spring-app-api-testing
image: techradaracr.azurecr.io/technology-radar-be:$(Build.BuildId)
env:
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: mysqldb-secret-testing
key: password
- name: MYSQL_PORT
valueFrom:
configMapKeyRef:
name: spring-app-testing-config-map
key: mysql_port
- name: MYSQL_HOST
valueFrom:
configMapKeyRef:
name: spring-app-testing-config-map
key: mysql_host
nodeSelector:
env: preprod
---
apiVersion: v1
kind: Service
metadata:
labels:
app: spring-app-api-testing
k8s-app: spring-app-api-testing
name: spring-app-api-testing-service
namespace: testing
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
type: LoadBalancer
selector:
app: spring-app-api-testing
First I deployed MySQl database, then java Spring API.
I guess the problem is with default resource allocation and MySQL db is using 90 % of overall RAM memory. That's why I'm receiving memory alert.
I know that there are sections for resources allocation in yaml config:
resources:
requests:
cpu: 250m
memory: 64Mi
limits:
cpu: 500m
memory: 256Mi
for minimum and maximum cpu and memory resources. Question is how many of them should I allocate for spring app and how many for mySQL database in order to avoid memory problems?
I would be grateful for help.
First of all, running the whole cluster on only one VM defeats the purpose of using Kubernetes, specially that you are using a small SKU for the VMSS. Have you considered running the application outside of k8s ?
To answer your question, there is no given formula or set values for the request/limits. The values you choose for resource requests and limits will depend on the specific requirements of your application and the resources available in your cluster.
In detail, you should consider the workload characteristics, cluster capacity, performance (if the values are too small, the application will struggle) and cost.
Please refer to the best practices here: https://learn.microsoft.com/en-us/azure/aks/developer-best-practices-resource-management

Azure Kubernetes : Azure Disks or Azure Files as data volumes?

I have an Azure Kubernetes cluster and I need to mount a data volume for an application like mentioned below
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql
spec:
replicas: 3
selector:
matchLabels:
app: mysql
strategy:
type: Recreate
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.6
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-db-password
key: db-password
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: mysql-persistent-storage
mountPath: /var/lib/mysql
- name: usermanagement-dbcreation-script
mountPath: /docker-entrypoint-initdb.d
volumes:
- name: mysql-persistent-storage
persistentVolumeClaim:
claimName: azure-managed-disk-pvc
- name: usermanagement-dbcreation-script
configMap:
name: usermanagement-dbcreation-script
I see that there are two options to create the Persistent Volume - either a based on Azure Disks or Azure Files.
I want to know what is the difference between Azure Disks or Azure Files with respect to Persistent Volume in Azure Kubernetes and when should I Azure Disks vs Azure Files?
For something as mysql (exclusive access to files) you are better off using Azure Disks. That would be pretty much a regular disk attached to the pod, whereas Azure Files are mostly meant to be used when you need ReadWriteMany access, not ReadWriteOnce
https://learn.microsoft.com/en-us/azure/aks/concepts-storage

Create a new volume when pod restart in a statefulset

I'm using azure aks to create a statefulset with volume using azure disk provisioner.
I'm trying to find a way to write my statefulset YAML file in a way that when a pod restarts, it will get a new Volume and the old volume will be deleted.
I know I can delete volumes manually, but is there any ways to tell Kubernetes to do this via statefulset yaml?
Here is my Yaml:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: janusgraph
labels:
app: janusgraph
spec:
...
...
template:
metadata:
labels:
app: janusgraph
spec:
containers:
- name: janusgraph
...
...
volumeMounts:
- name: data
mountPath: /var/lib/janusgraph
livenessProbe:
httpGet:
port: 8182
path: ?gremlin=g.V(123).count()
initialDelaySeconds: 120
periodSeconds: 10
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "default"
resources:
requests:
storage: 7Gi
If you want your data to be deleted when the pod restarts, you can use an ephemeral volume like EmptyDir.
When a Pod is removed/restarted for any reason, the data in the emptyDir is deleted forever.
Sample:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumes:
- name: www
emptyDir: {}
N.B.:
By default, emptyDir volumes are stored on whatever medium is backing the node - that might be disk or SSD or network storage, depending on your environment. However, you can set the emptyDir.medium field to "Memory" to tell Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead.

Kubernetes: Cassandra(stateful set) deployment on GCP

Has anyone tried deploying Cassandra (POC) on GCP using kubernetes (not GKE). If so can you please share info on how to get it working?
You could start by looking at IBM's Scalable-Cassandra-deployment-on-Kubernetes.
For seeds discovery you can use a headless service, similar to this Multi-node Cassandra Cluster Made Easy with Kubernetes.
Some difficulties:
fast local storage for K8s is still in beta; of course, you can use what k8s already has; there are some users reporting that they use Ceph RBD with 8 C* nodes each of them having 2TB of data on K8s.
at some point in time you will realize that you need a C* operator - here is some good startup - Instaclustr's Cassandra Operator and Pantheon Systems' Cassandra Operator
you need a way to scale in gracefully stateful applications (should be also covered by the operator; this is a solution if you don't want an operator, but you still need to use a controller).
You could also check the Cassandra mailing list, since there are people there already using Cassandra over K8s in production.
I have implemented cassandra on kubernetes. Please find my deployment and service yaml files:
apiVersion: v1
kind: Service
metadata:
labels:
app: cassandra
name: cassandra
spec:
clusterIP: None
ports:
- port: 9042
selector:
app: cassandra
---
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
name: cassandra
labels:
app: cassandra
spec:
serviceName: cassandra
replicas: 3
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
terminationGracePeriodSeconds: 1800
containers:
- name: cassandra
image: gcr.io/google-samples/cassandra:v12
imagePullPolicy: Always
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
resources:
limits:
cpu: "500m"
memory: 1Gi
requests:
cpu: "500m"
memory: 1Gi
securityContext:
capabilities:
add:
- IPC_LOCK
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- nodetool drain
env:
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 100M
- name: CASSANDRA_SEEDS
value: "cassandra-0.cassandra.default.svc.cluster.local"
- name: CASSANDRA_CLUSTER_NAME
value: "K8Demo"
- name: CASSANDRA_DC
value: "DC1-K8Demo"
- name: CASSANDRA_RACK
value: "Rack1-K8Demo"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
readinessProbe:
exec:
command:
- /bin/bash
- -c
- /ready-probe.sh
initialDelaySeconds: 15
timeoutSeconds: 5
volumeMounts:
- name: cassandra-data
mountPath: /cassandra_data
volumeClaimTemplates:
- metadata:
name: cassandra-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "fast"
resources:
requests:
storage: 5Gi
Hope this helps.
Use Helm:
On Mac:
brew install helm#2
brew link --force helm#2
helm init
To Avoid Kubernetes Helm permission Hell:
from: https://github.com/helm/helm/issues/2224:
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
Cassandra
incubator:
helm repo add https://github.com/helm/charts/tree/master/incubator/cassandra
helm install --namespace "cassandra" -n "cassandra" incubator/cassandra
helm status "cassandra"
helm delete --purge "cassandra"
bitnami:
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install --namespace "cassandra" -n "my-deployment" bitnami/cassandra
helm status "my-deployment"
helm delete --purge "my-deployment"

How to mount a volume with a windows container in kubernetes?

i'm trying to mount a persistent volume into my windows container, but i alwys get this error:
Unable to mount volumes for pod "mssql-with-pv-deployment-3263067711-xw3mx_default(....)": timeout expired waiting for volumes to attach/mount for pod "default"/"mssql-with-pv-deployment-3263067711-xw3mx". list of unattached/unmounted volumes=[blobdisk01]
i've created a github gist with the console output of "get events" and "describe sc | pvc | po" maybe someone will find the solution with it.
Below are my scripts that I'm using for deployment.
my storageclass:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: azure-disk-sc
provisioner: kubernetes.io/azure-disk
parameters:
skuname: Standard_LRS
my PersistentVolumeClaim:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: azure-disk-pvc
spec:
storageClassName: azure-disk-sc
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
and the deployment of my container:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: mssql-with-pv-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: mssql-with-pv
spec:
nodeSelector:
beta.kubernetes.io/os: windows
terminationGracePeriodSeconds: 10
containers:
- name: mssql-with-pv
image: testacr.azurecr.io/sql/mssql-server-windows-developer
ports:
- containerPort: 1433
env:
- name: ACCEPT_EULA
value: "Y"
- name: SA_PASSWORD
valueFrom:
secretKeyRef:
name: mssql
key: SA_PASSWORD
volumeMounts:
- mountPath: "c:/volume"
name: blobdisk01
volumes:
- name: blobdisk01
persistentVolumeClaim:
claimName: azure-disk-pvc
---
apiVersion: v1
kind: Service
metadata:
name: mssql-with-pv-deployment
spec:
selector:
app: mssql-with-pv
ports:
- protocol: TCP
port: 1433
targetPort: 1433
type: LoadBalancer
what am i doing wrong? is there another way to mount a volume?
thank for every help :)
I would try:
Change API version to v1: https://kubernetes.io/docs/concepts/storage/storage-classes/#azure-disk
kubectl get events to see you if have a more detailed error (I could figure out the reason when I used NFS watching events)
maybe is this bug, I read in this post?
You will need a new volume in D: drive, looks like folders in C: are not supported for Windows Containers, see here:
https://github.com/kubernetes/kubernetes/issues/65060
Demos:
https://github.com/andyzhangx/demo/tree/master/windows/azuredisk

Resources