Send spark driver logs running in k8s to Splunk - apache-spark

I am trying to run a sample spark job in kubernetes by following the steps mentioned here: https://spark.apache.org/docs/latest/running-on-kubernetes.html.
I am trying to send the spark driver and executor logs to Splunk.
Does spark provide any configuration to do the same?
How do I send the Splunk configurations like the HEC endpoint, port, token, etc in the spark-submit command?
I did try passing it as args to the the spark driver as
bin/spark-submit
--deploy-mode cluster
--class org.apache.spark.examples.JavaSparkPi
--master k8s://http://127.0.0.1:8001
--conf spark.executor.instances=2
--conf spark.app.name=spark-pi
--conf spark.kubernetes.container.image=gcr.io/spark-operator/spark:v2.4.4
--conf spark.kubernetes.authenticate.driver.serviceAccountName=<account>
--conf spark.kubernetes.docker.image.pullPolicy=Always
--conf spark.kubernetes.namespace=default
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar
--log-driver=splunk
--log-opt splunk-url=<url:port>
-—log-opt splunk-token=<token>
--log-opt splunk-index=<index>
--log-opt splunk-sourcetype=<sourceType>
--log-opt splunk-format=json
but the logs were not forwarded to the desired index.
I am using spark version 2.4.4 to run spark-submit.
Thanks in advance for any inputs!!

Hi and welcome to the Stackoverflow.
I've searched the web for a while trying to find the similar to your question cases of Spark + Splunk usages. What I've managed to realize is that possibly you're mixing several things. Referring the Docker docs about Splunk logging driver seems that you try to reproduce the same steps with `spark-submit. Unfortunately for you it doesn't work so.
Basically all the config options after local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar ... in your script are the program arguments for the org.apache.spark.examples.JavaSparkPi#main method , which (unless you customize it) simply ignores them.
What you need to do is to connect your Kubrnetes cluster to the Splunk API. One of the ways of doing that is installing the Splunk Connector to you Kubernetes cluster. Depending on your environment specifics there can be other ways of doing that, but reading the docs is a good place to start.
Hope it directs you to the right road.

Related

Why am I not able to run sparkPi example on a Kubernetes (K8s) cluster?

I have a K8s cluster up and running, on VMs inside VMWare Workstation, as of now. I'm trying to deploy a Spark application natively using the official documentation from here. However, I also landed on this article which made it clearer, I felt.
Now, earlier my setup was running inside nested VMs, basically my machine is on Win10 and I had an Ubuntu VM inside which I had 3 more VMs running for the cluster (not the best idea, I know).
When I tried to run my setup by following the article mentioned, I first created a service account inside the cluster called spark, then created a clusterrolebinding called spark-role, gave edit as the clusterrole and assigned it to the spark service account so that Spark driver pod has sufficient permissions.
I then try to run the example SparkPi job using this command line:
bin/spark-submit \
--master k8s://https://<k8-cluster-ip>:<k8-cluster-port> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=kmaster:5000/spark:latest \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 100
And it fails within a few seconds after it has created the driver-pod, it goes into Running state and after like 3 seconds goes into Error state.
On giving the command kubectl logs spark-pi-driver this is the log I get.
The second Caused by: is always either as mentioned above i.e:
Caused by: java.net.SocketException: Broken pipe (Write failed) or,
Caused by: okhttp3.internal.http2.ConnectionShutdownException
Log #2 for reference.
After running into dead-ends with this, I tried giving --deploy-mode client to see if it makes a difference and get more verbose logs. You can read the difference between client and cluster mode from here.
On deploying the job as client mode it still fails, however, now I see that each time the driver pod (now running not as a pod but as a process on the local machine) tries to create an executor pod, it goes into a loop infinitely trying to create an executor pod with a count-number appended to the pod name, as the last one goes into a terminated state. Also, now I can see the Spark UI on the 4040 port but the job doesn't move forward as it's stuck on trying to create even a single executor pod.
I get this log.
To me, this makes it pretty apparent that it's a resource crunch maybe?
So to be sure, I delete the nested VMs and setup 2 new VMs on my main machine and connect them using a NAT network and setup the same K8s cluster.
But now when I try to do the exact same thing it fails with the same error (Broken Pipe/ShutdownException), except now it tells me that it fails even at creating a driver-pod.
This is the log for reference.
Now I can't even fetch logs as to why it fails, because it's never even created.
I've broken my head over this and can't figure out why it's failing. Now, I tried out a lot of things to rule them out but so far nothing has worked except one (which is a completely different solution).
I tried the spark-on-k8-operator from GCP from here and it worked for me. I wasn't able to see the Spark UI as it runs briefly but it prints the Pi value in the shell window, so I know it works.
I'm guessing, that even this spark-on-k8s-operator 'internally' does the same thing but I really need to be able to deploy it natively, or at least know why it fails.
Any help here will be appreciated (I know it's a long post). Thank you.
Make sure the kubernetes version that you are deploying is compatible with the Spark version that you are using.
Apache Spark uses the Kubernetes Client library to communicate with the kubernetes cluster.
As per today the latest LTS Spark version is 2.4.5 which includes the kubernetes client version 4.6.3.
Checking the compatibility matrix of Kubernetes Client: here
The supported kubernetes versions go all the way up to v1.17.0.
Based on my personal experience Apache Spark 2.4.5 works well with kubernetes version v1.15.3. I have had problems with more recent versions.
When a not supported kubernetes version is used, the logs to get are as the ones you are describing:
Caused by: java.net.SocketException: Broken pipe (Write failed) or,
Caused by: okhttp3.internal.http2.ConnectionShutdownException
Faced exact same issue with v1.18.0, downgrading the version to v1.15.3 made it work
minikube start --cpus=4 --memory=4048 --kubernetes-version v1.15.3
Spark on K8s operator example uses a Spark image (from gcr.io) that works. You can find the image tag in spark-on-k8s-operator/examples/spark-pi.yaml
spec:
...
image: "gcr.io/spark-operator/spark:v2.4.5"
...
I tried to replace the image config in the bin/spark-submit and it worked for me.
bin/spark-submit \
--master k8s://https://192.168.99.100:8443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.driver.cores=1 \
--conf spark.driver.memory=512m \
--conf spark.executor.instances=2 \
--conf spark.executor.memory=512m \
--conf spark.executor.cores=1 \
--conf spark.kubernetes.container.image=gcr.io/spark-operator/spark:v2.4.5 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar

Some spark-submit config options not reflected in k8s pod

I'm using spark-submit to create a spark driver pod on my k8s cluster. When I run
bin/spark-submit
--master k8s://https://my-cluster-url:443
--deploy-mode cluster
--name spark-test
--class com.my.main.Class
--conf spark.executor.instances=3
--conf spark.kubernetes.allocation.batch.size=3
--conf spark.kubernetes.namespace=my-namespace
--conf spark.kubernetes.container.image.pullSecrets=my-cr-secret
--conf spark.kubernetes.container.image.pullPolicy=Always
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.my-pvc.mount.path=/var/service/src/main/resources/
--conf spark.kubernetes.container.image=my-registry.io/spark-test:test-3.0.0
local:///var/service/my-service-6.3.0-RELEASE.jar
spark-submit successfully creates a pod in my k8s cluster. However, many of the config options I specified are not seen. For example, the pod does not have a volume mounted at /var/service/src/main/resources/ despite the existence of a persistentVolumeClaim on the cluster called my-pvc. Further, the pod has not been given the specified image pull secret my-cr-secret, causing an ImagePullBackOff error. On the other hand, the pod is properly created in the my-namespace namespace and the pull policy Always.
I have attempted this using spark 3.0.0 and 2.4.5
Why are some config options not reflected in the pod created on my cluster?
Figured out the issue:
I currently have spark 2.3.1 installed locally and the variable SPARK_HOME points to /usr/local/spark. For this current project I downloaded a distribution of spark 2.4.5. I was in the 2.4.5 directory and running bin/spark-submit, which should have (as far as I can tell) pointed to the spark-submit bundled in 2.4.5. However, running bin/spark-submit --version revealed that the version being run was 2.3.1. The configurations that were being ignored in my question above were not available in 2.3.1.
Simply changing SPARK_HOME to the new directory fixed the issue

Using a k8s cluster as spark cluster manager on Spark 2.3.0

I was trying to submit a example job to k8s cluster from binary release of spark 2.3.0, the submit command is shown below. However, I have met an wrong master error all the time. I am really sure my k8s cluster is working fine.
bin/spark-submit \
--master k8s://https://<k8s-master-ip> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=3 \
--conf spark.kubernetes.container.image= <image-built-from-dockerfile> \
--conf spark.kubernetes.driver.pod.name=spark-pi-driver \
local:///opt/examples/jars/spark-examples_2.11-2.3.0.jar
and the error comes out
Error: Master must either be yarn or start with spark, mesos, local
and this is the output of kubectl cluster-info
Kubernetes master is running at https://192.168.0.10:6443
KubeDNS is running at https://192.168.0.10:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
because i am not good at English. so maybe have some wrong grammar. but i will do my best to responds your question. my resolve method is check your $SPARK_HOME and change to your "apache-spark-on-k8s" file path.because spark-submit is default use "${SPARK_HOME}" to run your command.maybe you have two spark environment in the same machine just like me. so command always use your original spark. hope this answer will help you.

Oozie spark action Log4j configuration

I am working on Oozie, using a Spark action on a Hortonworks2.5 cluster. I have configured this job in yarn client mode, with master=yarn mode=client.
My log4j configuration is shown below.
log4j.appender.RollingAppender.File=/opt/appName/logs/appNameInfo.log
log4j.appender.debugFileAppender.File=/opt/appName/logs/appNameDebug.log
log4j.appender.errorFileAppender.File=/opt/appName/logs/appNameError.log
The job expectation is once we trigger oozie job, in the above locations we should be able to see my application logs as Info,Debug,Error respectively.
Below is my spark-opts tag in my workflow.xml
<spark-opts>--driver-memory 4G --executor-memory 4G --num-executors 6 --executor-cores 3 --files /tmp/logs/appName/log4j.properties --conf spark.driver.extraJavaOptions='-Dlog4j.configuration=file:/tmp/logs/appName/log4j.properties' --conf spark.executor.extraJavaOptions='-Dlog4j.configuration=file:/tmp/logs/appName/log4j.properties'</spark-opts>
Once I trigger oozie coordinator, I am not able to see my application logs in /opt/appName/logs/ as configured in log4j.properties.
The same configuration is working with plain Spark-submit when I run from the node where /tmp/logs/appName/log4j.properties available in that particular node. Can some one please look in to the issue. It is not able to write to the location which is configured in log4j.properties file.
Is this log4j.properties file should be in hdfs?? if so, how to provide in spark-opts. is it would be hdfs:// ??
Can some one look in to the issue please?
Copy this log4j.properties in oozie.sharelib.path(HDFS) and the spark should be able to copy in the final yarn container.

execute Spark jobs, with Livy, using `--master yarn-cluster` without making systemwide changes

I'd like to execute a Spark job, via an HTTP call from outside the cluster using Livy, where the Spark jar already exists in HDFS.
I'm able to spark-submit the job from shell on the cluster nodes, e.g.:
spark-submit --class io.woolford.Main --master yarn-cluster hdfs://hadoop01:8020/path/to/spark-job.jar
Note that the --master yarn-cluster is necessary to access HDFS where the jar resides.
I'm also able to submit commands, via Livy, using curl. For example, this request:
curl -X POST --data '{"file": "/path/to/spark-job.jar", "className": "io.woolford.Main"}' -H "Content-Type: application/json" hadoop01:8998/batches
... executes the following command on the cluster:
spark-submit --class io.woolford.Main hdfs://hadoop01:8020/path/to/spark-job.jar
This is the same as the command that works, minus the --master yarn-cluster params. This was verified by tailing /var/log/livy/livy-livy-server.out.
So, I just need to modify the curl command to include --master yarn-cluster when it's executed by Livy. At first glance, it seems like this should be possible by adding arguments to the JSON dictionary. Unfortunately, these aren't passed through.
Does anyone know how to pass --master yarn-cluster to Livy so that jobs are executed on YARN without making systemwide changes?
I recently tried something similar as your question. I need to send a HTTP request to Livy's API, while Livy is already installed in a cluster (YARN), and then I want to let Livy start a Spark job.
My command to call Livy did not include --master yarn-cluster, but that seems to work for me. Maybe you can try to put your JAR file in local in stead of in a cluster?
spark.master = yarn-cluster
set it in the spark conf, for me:/etc/spark2/conf/spark-defaults.conf

Resources