Kubernetes + Spark Job not Progressing/Stuck - apache-spark

I am trying to run pyspark code on Kubernetes cluster.
The application flow should be: read data -> cache -> perform multiple actions, but the job is not progressing at all. It is stuck on the log message:
WatchConnectionManager: The resource version -some number- no longer exists. Scheduling a reconnect.
What could be the problem?

Looks like an issue within Spark, should be fixed in versions 3.0.2, 3.1.0
https://issues.apache.org/jira/browse/SPARK-24266

Related

Cannot get PySpark working in Kubernetes getting (Initial job has not accepted any resources)

I'm trying to use the following Helm Chart for Spark on Kubernetes
https://github.com/bitnami/charts/tree/main/bitnami/spark
The documentation is of course spotty but I've muddled along. So I have it installed with custom values that assign things like resource limits etc. I'm accessing the master through a NodePort and the WebUI through a port forward. I am NOT using spark-submit, I'm writing Python code to drive the Spark Cluster as follows:
import pyspark
sc = pyspark.SparkContext(appName="Testy", master="spark://<IP>:<PORT>")
This Python code is running locally on my Windows laptop, the Kubernetes cluster is on a separate set of servers. It connects and I can see the app appear in the WebUI but the second it tries to do something I get the following:
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
The master seems to be in a cycle of removing and launching executors and the 3 workers each just fail to run a launch command. Interestingly the command has the hostname of my laptop in here:
"--driver-url" "spark://CoarseGrainedScheduler#<laptop hostname>:60557"
Got to imagine that's not right. So in this setup where should I be actually running the python code? On the kubernetes cluster? Can I run it locally on my laptop? These details are of course missing from the docs. I'm new to Spark so just looking for the absolute basics. My preferred workflow would be to develop code locally on my laptop then run it on the Kubernetes cluster I have access to.

Unable to run hop pipelines on Spark running on Kubernetes

I am looking for help in running hop pipelines on Spark cluster, running on kubernetes.
I have spark master deployed with 3 worker nodes on kubernetes
I am using hop-run.sh command to run pipeline on spark running on kubernetes.
Facing Below exception
-java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
Looks like fat.jar is not getting associated with the spark when running hop-run.sh command.
I tried running same with spark-submit command too but not sure how to pass references of pipelines and workflows to Spark running on kubernetes, though I am able to add fat jar to the classpath (can be seen in logs)
Any kind of help is appreciated.
Thanks
like
Could it be that you are using version 1.0?
We had a missing jar for S3 VFS which has been resolved in 1.1
https://issues.apache.org/jira/browse/HOP-3327
For more information on how to use spark-submit you can take a look at the following documentation:
https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-spark-pipeline-engine.html#_running_with_spark_submit
The location to the fat-jar the pipeline and the required metadata-export can all be VFS locations so no need to place those on the cluster itself.

Azure HDInsights Spark Cluster Install External Libraries

I have a HDInsights Spark Cluster. I installed tensorflow using a script action. The installation went fine (Success).
But now when I go and create a Jupyter notebook, I get:
import tensorflow
Starting Spark application
The code failed because of a fatal error:
Session 8 unexpectedly reached final status 'dead'. See logs:
YARN Diagnostics:
Application killed by user..
Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context. For instructions on how to assign resources see http://go.microsoft.com/fwlink/?LinkId=717038
b) Contact your cluster administrator to make sure the Spark magics library is configured correctly.
I don't know how to fix this error... I tried some things like looking at logs but they are not helping.
I just want to connect to my data and train a model using tensorflow.
This looks like error with Spark application resources. Check resources available on your cluster and close any applications that you don't need. Please see more details here: https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-resource-manager#kill-running-applications

Rest API for Spark2.3 submit on kubernetes(version 1.8.*) cluster

Im using kubernetes cluster on AWS to run spark jobs ,im using spark 2.3 ,now i want to run spark-submit from AWS lambda function to k8s master,would like to know if there is any REST interface to run Spark submit on k8s Master?
Unfortunately, it is not possible for Spark 2.3, in case you are using native Kubernetes support.
Based on description from deployment instruction, submission process contains several steps:
Spark creates a Spark driver running within a Kubernetes pod.
The driver creates executors which are also running within Kubernetes pods
The driver connects to them, and executes application code
When the application completes, executor pods terminate and are cleaned up, but the driver pod persists its logs and remains in “completed” state in the Kubernetes API until it’s eventually garbage collected or manually cleaned up.
So, in fact, you have no place to submit a job until you start a submission process, which will launch the first Spark's pod (driver) for you. Only once application completes, everything is terminated.
Please also see similar answer for this question under the link

HDInsight SparkHistory on Azure shows no applications

I have created a Spark HDInsight Cluster on Azure. The cluster was used to run different jobs (either Spark or Hive).
Until a month ago, the history of the jobs could be seen in the Spark History Server dashboard. It seems that following the update that introduced Spark 1.6.0, this dashboard is no longer showing any applications.
I have also tried to bypass this issue by executing the PowerShell cmdlet for get-azurehdinsightjob as sugested here. The output is again an empty list of applications.
I would appreciate any help as this dashboard used to work and now all my experiments are stalled.
I managed to solve the issue by deleting everything inside wasb:///hdp/spark-events. Maybe the issue was related to the size of the folder, as no other log files could be appended.
All the following jobs are now appearing successfully in the Spark History Server dashboard.

Resources