Spark gives error(Initial Job has not been accepted) - apache-spark

I am trying spark on amazon EC2 but i am facing this issue.
When i tried to run with local spark worker it works fine but when i tried with only EC2 instance it give error of Initial Job has been not accepted.
How I resolve this error?

This error aises when your all resources are either held by another Job or the spark cluster has not enough resources to run your application. Try to tune the resources and check the configuration from Spark UI.

Related

Unable to run hop pipelines on Spark running on Kubernetes

I am looking for help in running hop pipelines on Spark cluster, running on kubernetes.
I have spark master deployed with 3 worker nodes on kubernetes
I am using hop-run.sh command to run pipeline on spark running on kubernetes.
Facing Below exception
-java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
Looks like fat.jar is not getting associated with the spark when running hop-run.sh command.
I tried running same with spark-submit command too but not sure how to pass references of pipelines and workflows to Spark running on kubernetes, though I am able to add fat jar to the classpath (can be seen in logs)
Any kind of help is appreciated.
Thanks
like
Could it be that you are using version 1.0?
We had a missing jar for S3 VFS which has been resolved in 1.1
https://issues.apache.org/jira/browse/HOP-3327
For more information on how to use spark-submit you can take a look at the following documentation:
https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-spark-pipeline-engine.html#_running_with_spark_submit
The location to the fat-jar the pipeline and the required metadata-export can all be VFS locations so no need to place those on the cluster itself.

Communicate to cluster that Spark History server is running

I have a working Spark cluster, with a master node and some worker nodes running on Kubernetes. This cluster has been used for multiple spark submit jobs and is operational.
On the master node, I have started up a Spark History server using the $SPARK_HOME/sbin/start-history-server.sh script and some configs to determine where the History Server's logs should be written:
spark.eventLog.enabled=true
spark.eventLog.dir=...
spark.history.fs.logDirectory=...
spark.hadoop.fs.s3a.access.key=...
spark.hadoop.fs.s3a.secret.key=...
spark.hadoop.fs.s3a.endpoint=...
spark.hadoop.fs.s3a.path.style.access=true
This was done a while after the cluster was operational. The server is writing the logs to an external DB (minIO using the s3a protocol).
Now, whenever I submit spark jobs it seems like nothing is being written away in the location I'm specifying.
I'm wondering about the following: How can the workers know I have started up the spark history server on the master node? Do I need to communicate this to the workers somehow?
Possible causes that I have checked:
No access/permissions to write to minIO: This shouldn't be the case as I'm running spark submit jobs that read/write files to the same minIO using the same settings
Logs folder does not exist: I was getting these errors before, but then I created a location for the files to be written away and since then I'm not getting issues
spark.eventLog.dir should be the same as spark.history.fs.logDirectory: they are
Just found out the answer: the way your workers will know where to store the logs is by supplying the following configs to your spark-submit job:
spark.eventLog.enabled=true
spark.eventLog.dir=...
spark.history.fs.logDirectory=...
It is probably also enough to have these in your spark-defaults.conf on the driver program, which is why I couldn't find a lot of info on this as I didn't add it to my spark-defaults.conf.

Does Spark 2.4.4 support forwarding Delegation Tokens when master is k8s?

I'm currently in the process of setting up a Kerberized environment for submitting Spark Jobs using Livy in Kubernetes.
What I've achieved so far:
Running Kerberized HDFS Cluster
Livy using SPNEGO
Livy submitting Jobs to k8s and spawning Spark executors
KNIME is able to interact with Namenode and Datanodes from outside the k8s Cluster
To achieve this I used the following Versions for the involved components:
Spark 2.4.4
Livy 0.5.0 (The currently only supported version by KNIME)
Namenode and Datanode 2.8.1
Kubernetes 1.14.3
What I'm currently struggling with:
Accessing HDFS from the Spark executors
The error message I'm currently getting, when trying to access HDFS from the executor is the following:
org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "livy-session-0-1575455179568-exec-1/10.42.3.242"; destination host is: "hdfs-namenode-0.hdfs-namenode.hdfs.svc.cluster.local":8020;
The following is the current state:
KNIME connects to HDFS after having successfully challenged against the KDC (using Keytab + Principal) --> Working
KNIME puts staging jars to HDFS --> Working
KNIME requests new Session from Livy (SPNEGO challenge) --> Working
Livy submits Spark Job with k8s master / spawns executors --> Working
KNIME submits tasks to Livy which should be executed by the executors --> Basically working
When trying to access HDFS to read a file the error mentioned before occurs --> The problem
Since KNIME is placing jar files on HDFS which have to be included in the dependencies for the Spark Jobs it is important to be able to access HDFS. (KNIME requires this to be able to retrieve preview data from DataSets for example)
I tried to find a solution to this but unfortunately, haven't found any useful resources yet.
I had a look at the code an checked UserGroupInformation.getCurrentUser().getTokens().
But that collection seems to be empty. That's why I assume that there are not Delegation Tokens available.
Has anybody ever achieved running something like this and can help me with this?
Thank you all in advance!
For everybody struggeling with this:
It took a while to find the reason on why this is not working, but basically it is related to Spark's Kubernetes implementation as of 2.4.4.
There is no override defined for CoarseGrainedSchedulerBackend's fetchHadoopDelegationTokens in KubernetesClusterSchedulerBackend.
There has been the pull request which will solve this by passing secrets to executors containing the delegation tokens.
It was already pulled into master and is available in Spark 3.0.0-preview but is not, at least not yet, available in the Spark 2.4 branch.

Spark jobs not showing up in Hadoop UI in Google Cloud

I created a cluster in Google Cloud and submitted a spark job. Then I connected to the UI following these instructions: I created an ssh tunnel and used it to open the Hadoop web interface. But the job is not showing up.
Some extra information:
If I connect to the master node of the cluster via ssh and run spark-shell, this "job" does show up in the hadoop web interface.
I'm pretty sure I did this before and I could see my jobs (both running and already finished). I don't know what happened in between for them to stop appearing.
The problem was that I was running my jobs in local mode. My code had a .master("local[*]") that was causing this. After removing it, the jobs showed up in the Hadoop UI as before.

Rest API for Spark2.3 submit on kubernetes(version 1.8.*) cluster

Im using kubernetes cluster on AWS to run spark jobs ,im using spark 2.3 ,now i want to run spark-submit from AWS lambda function to k8s master,would like to know if there is any REST interface to run Spark submit on k8s Master?
Unfortunately, it is not possible for Spark 2.3, in case you are using native Kubernetes support.
Based on description from deployment instruction, submission process contains several steps:
Spark creates a Spark driver running within a Kubernetes pod.
The driver creates executors which are also running within Kubernetes pods
The driver connects to them, and executes application code
When the application completes, executor pods terminate and are cleaned up, but the driver pod persists its logs and remains in “completed” state in the Kubernetes API until it’s eventually garbage collected or manually cleaned up.
So, in fact, you have no place to submit a job until you start a submission process, which will launch the first Spark's pod (driver) for you. Only once application completes, everything is terminated.
Please also see similar answer for this question under the link

Resources