Error when running spark application with zeppelin - apache-spark

When I run the above spark application with zeppelin in Yarn cluster with cluster mode, I get the following error:
Where may be the problem? Thanks

Related

Spark Structure Streaming job failing in cluster mode

I am using spark-sql-2.4.1 v in my application.
While writing data on to hdfs folder I am facing this issue in spark-streaming application
Error:
yarn.Client: Deleted staging directory hdfs://dev/user/xyz/.sparkStaging/application_1575699597805_47
20/02/24 14:02:15 ERROR yarn.Client: Application diagnostics message: User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied: user= xyz, access=WRITE, inode="/tmp/hadoop-admin":admin:supergroup:drwxr-xr-x
.
.
.
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=xyz, access=WRITE, inode="/tmp/hadoop-admin":admin:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:350)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:251)
While writing data on to HDFS folder I am facing this issue in spark-streaming application. When I run in yarn-cluster mode I face this issue i.e.
--master yarn \
--deploy-mode cluster \
But when I run in “yarn-client” mode it runs fine i.e.
--master yarn \
--deploy-mode client \
What is the root cause of this problem?
Fundamental question here, why it is trying to write in "/tmp/hadoop-admin/" instead of respective user directory i.e. hdfs://qa2/user/xyz/?
I have come across this fix:
https://issues.apache.org/jira/browse/SPARK-26825
How can I implement it in my spark-sql application?
The only difference between the working --deploy-mode client and the failing --deploy-mode cluster cases is the location of the driver. In client deploy mode, the driver runs on the machine you execute spark-submit (which is usually an edge node that is configured to use a YARN cluster, but it is not part of it) while in cluster deploy mode the driver runs as part of a YARN cluster (one of the nodes under control of YARN).
It looks like you've got a misconfigured edge node.
I'd not be surprised if a regular Spark SQL-only Spark application would be failing too. I'd not be surprised to hear that it has nothing to do with a streaming query (Spark Structured Streaming) and would fail for any Spark application.

Running spark application in local mode

I'm trying to start my Spark application in local mode using spark-submit. I am using Spark 2.0.2, Hadoop 2.6 & Scala 2.11.8 on Windows. The application runs fine from within my IDE (IntelliJ), and I can also start it on a cluster with actual, physical executors.
The command I'm running is
spark-submit --class [MyClassName] --master local[*] target/[MyApp]-jar-with-dependencies.jar [Params]
Spark starts up as usual, but then terminates with
java.io.Exception: Failed to connect to /192.168.88.1:56370
What am I missing here?
Check which port you are using: if on cluster: log in to master node and include:
--master spark://XXXX:7077
You can find it always in spark ui under port 8080
Also check your spark builder config if you have set master already as it takes priority when launching eg:
val spark = SparkSession
.builder
.appName("myapp")
.master("local[*]")

Too many HashedWheelTimer error in Spark application

I met this error when running spark streaming application. This error happened only once and the application has been running over 500 hours.
Here are some dependencies and their versions:
Spark: 1.5.1
Spark-Cassandra-Connector 1.5.0-M3
Thanks

SparkDeploySchedulerBackend Error: Application has been killed. All masters are unresponsive

While I'm starting Spark shell:
bin>./spark-shell
I get the following error :
Spark assembly has been built with Hive, including Data nucleus jars on classpath
Welcome to SPARK VERSION 1.3.0
Using Scala version 2.10.4 (Java HotSpot(TM) Server VM, Java 1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.
15/05/10 12:12:21 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/05/10 12:12:21 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
I have installed spark by follow below link :- http://www.philchen.com/2015/02/16/how-to-install-apache-spark-and-cassandra-stack-on-ubuntu
You should supply your Spark Cluster's Master URL when start a spark-shell
At least:
bin/spark-shell --master spark://master-ip:7077
All the options make up a long list and you can find the suitable ones yourself:
bin/spark-shell --help
I am assuming that you are running this in standalone/local mode.
Run your spark shell with following line. That indicates you are using all the available cores of your master which is local machine.
bin/spark-shell --master local[*]
http://spark.apache.org/docs/1.2.1/submitting-applications.html#master-urls
You also need to start spark master and slave before giving spark-submit command
start-master.sh
start-slave.sh spark://spark:7077
then use
spark-submit --master spark://spark:7077
Look at your log files for "permission denied" errors... It may happens that your client service doesn't have the proper authority to access your Master folders.

Spark cluster set up error

With some research over the internet, I can use
sbin/start-master.sh
to start the spark master server spark service over my Ubuntu Linux computers
and use
bin/spark-class org.apache.spark.deploy.worker.Worker spark://...
for the slave nodes service up and running.
The good news was I can see the local web page with works found alive.
However, after such, I tried to launch the shell to work ...
MASTER=spark://localhost:7077 bin/spark-shell
but it returned:
sparkMaster#localhost:7077 ...
And therefore I modified the code to
MASTER=spark://sparkuser#localhost:7077 bin/spark-shell
where the sparkuser is the one connected to the two nodes
However, with such modification, I got:
ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
and when I tried
MASTER=local-cluster[3,2,1024] bin/spark-shell
It did come out with the spark logo in the shell but I was afraid the slave nodes were not binding in.
Did I miss anything for the Spark cluster setting?
Just launch spark-shell on cluster with --master flag as follows
./spark-shell --master spark://localhost:7077 bin/spark-shell

Resources