I'd like to use Prometheus to monitor Spark 3.
Firstly, I deploy the Prometheus and Spark 3 via helm, and they both up and running.
Then, I followed this blog
Spark 3.0 Monitoring with Prometheus to get spark 3 to expose its metrics by uncommenting these lines from the metrics.properties
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
My first spark-submit job works fine, and I am able to see all metrics_executor_* from Prometheus UI
spark-submit
…
--conf spark.ui.prometheus.enabled=true \
--conf spark.kubernetes.driver.annotation.prometheus.io/scrape=true \
--conf spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/ \
--conf spark.kubernetes.driver.annotation.prometheus.io/port=4040 \
…
However, when I tried to submit another job to get metrics from master and worker, the desired metrics_master* or metrics_worker* metrics were not shown in Prometheus
spark-submit
…
--conf spark.ui.prometheus.enabled=true \
--conf spark.kubernetes.driver.annotation.prometheus.io/scrape=true \
--conf spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/ \
--conf spark.kubernetes.driver.annotation.prometheus.io/port=4040 \
--conf spark.kubernetes.driver.annotation.prometheus.io/scrape=true \
--conf spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/prometheus/ \
--conf spark.kubernetes.driver.annotation.prometheus.io/port=8081 \
--conf spark.kubernetes.driver.annotation.prometheus.io/scrape=true \
--conf spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/master/prometheus/ \
--conf spark.kubernetes.driver.annotation.prometheus.io/port=8080 \
--conf spark.kubernetes.driver.annotation.prometheus.io/scrape=true \
--conf spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/applications/prometheus/ \
--conf spark.kubernetes.driver.annotation.prometheus.io/port=8080 \
…
It seems the relevant spark services in Kubernetes only expose port 4040, although I nominated port 8080 and 8081 in the spark-sbumit. That could be the reason whymetrics_master* and metrics_worker* are absent from Prometheus.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-spark-py-042f06864dacaf4d-driver-svc ClusterIP None <none> 7078/TCP,7079/TCP,4040/TCP 4m46s
my-spark-py84edeb8643cde90e-driver-svc ClusterIP None <none> 7078/TCP,7079/TCP,4040/TCP 46h
Is there any fix for it?
Related
I am trying to run an ETL job using Apache Spark (Java) in Kubernetes cluster. The Application is running, and data is getting inserted into database (mysql). But, the application is not seen in Spark Web UI.
The command I used for submitting the application is:
./spark-submit --class com.xxxx.etl.EtlApplication \
--name MyETL \
--master k8s://XXXXXXXXXX.xxx.us-west-2.eks.amazonaws.com:443 \
--conf "spark.kubernetes.container.image=YYYYYY.yyy.ecr.us-west-2.amazonaws.com/spark-poc:32" \
--conf "spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://my-spark-master-headless.default.svc.cluster.local:7077" \
--conf "spark.kubernetes.authenticate.driver.serviceAccountName=my-spark" \
--conf "spark.kubernetes.driver.request.cores=256m" \
--conf "spark.kubernetes.driver.limit.cores=512m" \
--conf "spark.kubernetes.executor.request.cores=256m" \
--conf "spark.kubernetes.executor.limit.cores=512m" \
--deploy-mode cluster \
local:///opt/bitnami/spark/examples/jars/EtlApplication-with-dependencies.jar 1000
I use a jenkins job to build my code and move the jar to /opt/bitnami/spark/examples/jars folder in the container inside the cluster.
The job is seen running in the pod when I check with kubectl get pods, and is seen on taking localhost:4040 after mapping the port to localhost using kubectl port-forward pod/myetl-df26f5843cb88da7-driver 4040:4040
Tried the same spark-submit command with Spark example jar (which came along with Spark installation in the container):
./spark-submit --class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.container.image=YYYYYY.yyy.ecr.us-west-2.amazonaws.com/spark-poc:5" \
--master k8s://XXXXXXXXXX.xxx.us-west-2.eks.amazonaws.com:443 \
--conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://my-spark-master-headless.default.svc.cluster.local:7077" \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=my-spark \
--deploy-mode cluster \
local:///opt/bitnami/spark/examples/jars/spark-examples_2.12-3.3.0.jar 1000
This time this application is getting listed in the Spark Web UI. I tried several options, and on removing the line --conf spark.kubernetes.driverEnv.SPARK_MASTER_URL=spark://my-spark-master-headless.default.svc.cluster.local:7077", the SparkPi example application is also not displayed in Spark Web UI.
Am I missing something? Do I need to change my java code to accept spark.kubernetes.driverEnv.SPARK_MASTER_URL? Tried several options buut nothing works.
Thanks in advance.
We are running Spark Cluster on Kubernetes. When we submited jobs as below, driver pod and executer pods were all up and running. However, the application failed to work as expected, the root cause we suspected is that it failed to find the source path as specified by parameter "py-files". As we witnessed, the driver pod has a warning MountVolume.Setup failed for volume "spark-conf-volume".
Would you please advise?
bin/spark-submit \
--master k8s://https://k8s-master-ip:6443 \
--deploy-mode cluster \
--name algo-vm \
--py-files hdfs://{our_ip}:9000/testdata/src.zip \
--conf spark.executor.instances=2 \
--conf spark.driver.port=10000 \
--conf spark.port.maxRetries=1 \
--conf spark.blockManager.port=20000 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.kubernetes.container.image={our_ip}/sutpc/k8s-spark-242-entry/spark-py:1.0 \
--jars hdfs://hdfs-master-ip:9000/jar/spark-sql-kafka-0-10_2.11-2.4.5.jar,hdfs://{our_ip}:9000/jar/kafka-clients-0.11.0.2.jar \
hdfs://{our_ip}:9000/testdata/spark_main.py
I have a PySpark job present locally on my laptop. If I want to submit it on my minikube cluster using spark-submit, any idea how to pass the python file ?
I'm using following command, but it isn't working
./spark-submit \
--master k8s://https://192.168.64.6:8443 \
--deploy-mode cluster \
--name amazon-data-review \
--conf spark.kubernetes.namespace=jupyter \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.driver.limit.cores=1 \
--conf spark.executor.cores=1 \
--conf spark.executor.memory=500m \
--conf spark.kubernetes.container.image=prateek/spark-ubuntu-2.4.5 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.container.image.pullSecrets=dockerlogin \
--conf spark.eventLog.enabled=true \
--conf spark.eventLog.dir=s3a://prateek/spark-hs/ \
--conf spark.hadoop.fs.s3a.access.key=xxxxx \
--conf spark.hadoop.fs.s3a.secret.key=xxxxx \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.fast.upload=true \
/Users/prateek/apache-spark/amazon_data_review.py
Getting following error -
python3: can't open file '/Users/prateek/apache-spark/amazon_data_review.py': [Errno 2] No such file or directory
Is it required to keep the file within the Docker image itself. Can't we run it locally by keeping it on laptop
Spark on Kubernetes doesn't support submitting locally stored files with spark-submit.
What you could do to make it work in cluster mode is to build Spark Docker image based on prateek/spark-ubuntu-2.4.5 with amazon_data_review.py put inside of it (eg using Docker COPY /Users/prateek/apache-spark/amazon_data_review.py /amazon_data_review.py statement).
Then just refer to it in the spark-submit command using local:// file system, eg.:
spark-submit \
--master ... \
--conf ... \
...
local:///amazon_data_review.py
The alternative is to store that file on http(s):// or hdfs://-like accessible location.
It's solved. Running it with client mode helped to run it
--deploy-mode client
My Mac OS/X Version : 10.15.3
Minikube Version: 1.9.2
I start the minikube use the following command without any extra configuration.
minikube start --driver=virtualbox
--image-repository='registry.cn-hangzhou.aliyuncs.com/google_containers' --cpus 4 --memory 4096 --alsologtostderr
And I download spark-2.4.5-bin-hadoop2.7 from the Spark official website and build spark images by the following command
eval $(minikube docker-env)
./bin/docker-image-tool.sh -m -t 2.4.5 build
then I run Spark-pi using the follwing command within my local machine where store the Spark 2.4.5.
kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=admin --serviceaccount=default:spark --namespace=default
./bin/spark-submit \
--master k8s://https://192.168.99.104:8443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.container.image=spark:2.4.5 \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar
I get the following error
the full log can be found at full log
Can anyone explain this error and how to solve it?
Please check the Kubernetes version you launched with Minikube.
Spark v2.4.5 fabric8 Kubernetes client v4.6.1 is compatible with Kubernetes API up to 1.15.2 (refer answer).
You can launch the specific Kubernetes API version with Minikube by adding --kubernetes-version flag to minikube start command (refer docs).
Also the issue might be caused by OkHttp library bug described in the comment of this qustion.
Another spark image (from gcr.io/spark-operator/spark) worked for me, without downgrading the version of Kubernetes.
bin/spark-submit \
--master k8s://https://192.168.99.100:8443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.driver.cores=1 \
--conf spark.driver.memory=512m \
--conf spark.executor.instances=2 \
--conf spark.executor.memory=512m \
--conf spark.executor.cores=1 \
--conf spark.kubernetes.container.image=gcr.io/spark-operator/spark:v2.4.5 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar
I am trying to submit a spark job to a kubernetes cluster in cluster mode from a client in the cluster with --packages attribute to enable dependencies are downloaded by driver and executer but it is not working. It refers to path on submitting client. ( kubectl proxyis on )
here it the the submit options
/usr/local/bin/spark-submit \
--verbose \
--master=k8s://http://127.0.0.1:8001 \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.namespace=spark \
--conf spark.kubernetes.container.image= <...> \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.kubernetes.driver.secretKeyRef.AWS_ACCESS_KEY_ID=datazone-s3-secret:AWS_ACCESS_KEY_ID \
--conf spark.kubernetes.driver.secretKeyRef.AWS_SECRET_ACCESS_KEY=datazone-s3-secret:AWS_SECRET_ACCESS_KEY \
--packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3 \
s3.py 10
On the logs I can see that packages are referring my local file system.
Spark config:
(spark.kubernetes.namespace,spark)
(spark.jars,file:///Users/<my username>/.ivy2/jars/com.amazonaws_aws-java-sdk-1.7.4.jar,file:///Users/<my username>/.ivy2/jars/org.apache.hadoop_hadoop-aws-2.7.3.jar,file:///Users/<my username>/.ivy2/jars/joda-time_joda-time-2.10.5.jar, ....
Did someone face this problem?