Spark Session gets killed after few minutes - apache-spark

I am using a GCP VM and have installed ubuntu on it, I also have Hadoop, yarn, and spark services installed on it. Now the issue is, my spark session gets killed after some time though all the jps are running. Attaching a screenshot for your reference. Can someone please help me with this?

Related

Spark driver failure recovery

I want to know about how spark restarts the spark driver in case of failure. My understanding is since the driver node is failed, all the computations will be lost, so the restart will mean re-submitting the application. I want to know how driver program is restarted w.r.t to Yarn as resource manager, I know that mesos has a standalone driver node and standalone mode we have --supervise flag but I'm not too sure about yarn. Any explanation will help. The answer I need w.r.t non-streaming application. Sorry for the big question.

Job inside spark application is complete but I still see the status as running, why?

I am running a spark application that completed all its jobs but still the status of this job yarn cluster portal is RUNNING (for more than 30 mins). Please let me know why it is happening.
Spark UI showing my jobs are completed
Spark application status is still running
I had the same problem with Spark 2.4.8 running on K8S, I didn’t understand why, but I solved it by stopping the context manually
spark.sparkContext.stop()

Spark jobs not showing up in Hadoop UI in Google Cloud

I created a cluster in Google Cloud and submitted a spark job. Then I connected to the UI following these instructions: I created an ssh tunnel and used it to open the Hadoop web interface. But the job is not showing up.
Some extra information:
If I connect to the master node of the cluster via ssh and run spark-shell, this "job" does show up in the hadoop web interface.
I'm pretty sure I did this before and I could see my jobs (both running and already finished). I don't know what happened in between for them to stop appearing.
The problem was that I was running my jobs in local mode. My code had a .master("local[*]") that was causing this. After removing it, the jobs showed up in the Hadoop UI as before.

Spark streaming fails to launch worker on worker failure

I'm trying to setup a spark cluster and I've come across an annoying bug...
When I submit a spark application it runs fine on workers until I kill one (for example by using stop-slave.sh on the worker node).
When the worker is killed spark will then try to relaunch an executor on an available worker node but it fails everytime (I know because the webUI either displays FAILED or LAUNCHING for the executor, it never succeeds).
I can't seem to find any help, even on the documentation, so can someone assure me that spark can and will try to relaunch a worker on an available node if one is killed (on the same node where the worker previously ran or on another available node if the node where it previously rank is unreachable) ?
Here's the output from the worker node :
Spark worker error
Thank you for your help !

Exiting Java gateway on Azure HDInsight with PySpark

i am using Azure HDInsight and PySpark.
Now a previously working snippet fails with the exception
"Java gateway process exited before sending the driver its port number".
The pyspark source contains at that point the comment "In Windows, ensure the Java child processes do not linger after Python has exited.".
Even restarting the HDInsight instance doesn't fixes that issue.
Has anybody else of you an idea how to fix it?
I ran into the same problem, I logged into my HDInsight cluster via RDP and restarted the IPython service. This seems to have fixed the issue.

Resources