Yarn-Cluster mode - ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms - apache-spark

in my pyspark program i have as,
from pyspark import SparkConf, SparkContext, SQLContex
conf=SparkConf()
conf.setAppName("spark_name")
conf.set("spark.dynamicAllocation.enabled", "true")
conf.set("spark.shuffle.service.enabled", "true")
sc = SparkContext(conf=conf)
And running my pyspark program as
./spark-submit --master yarn-cluster PySparkProgramPath
then this job failed with
Exception in thread "main" org.apache.spark.SparkException: Application application_1482268372614_0318 finished with failed status
after i run yarn logs -applicationId application_1482268372614_0318
it showing the error
ERROR ApplicationMaster: User class threw exception: java.io.IOException: Cannot run program
ERROR ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors.
[EDIT] Not facing this error when i submit this in yarn-client mode

Related

How to connect to remote Cassandra server through pyspark for write operation?

I am trying to connect to remote Cassandra server through pyspark, but it is not performing write operation in Cassandra while running cronjob. The same code works on the server on jupyter notebook, but not through cronjob.
`os.environ['PYSPARK_SUBMIT_ARGS'] = '--master local[*] pyspark-shell --packages com.datastax.spark:spark-cassandra-connector_2.12:2.5.0 --conf spark.cassandra.connection.host=127.0.0.1 pyspark-shell --conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions'
from pyspark import SparkContext
sc = SparkContext("local", "keyspace_name")
sqlContext = SQLContext(sc)
Data_to_Write.write.format("org.apache.spark.sql.cassandra").mode('append').options(table="tablename",keyspace="keyspace_name").save()`
I see this error in the cassandra logs : ERROR [Messaging-EventLoop-3-3] 2020-08-05 09:24:36,606 OutboundConnectionInitiator.java:373 - Failed to handshake with peer xx.xxx.xxx.xxx:9042(xx.xxx.xxx.xxx:9042) org.apache.cassandra.net.Crc$InvalidCrc –

ERROR : User did not initialize spark context

Log error :
TestSuccessfull
2018-08-20 04:52:15 INFO ApplicationMaster:54 - Final app status: FAILED, exitCode: 13
2018-08-20 04:52:15 ERROR ApplicationMaster:91 - Uncaught exception:
java.lang.IllegalStateException: User did not initialize spark context!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:498)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:800)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:799)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:824)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
2018-08-20 04:52:15 INFO SparkContext:54 - Invoking stop() from shutdown hook
Error log on console After submit command :
2018-08-20 05:47:35 INFO Client:54 - Application report for application_1534690018301_0035 (state: ACCEPTED)
2018-08-20 05:47:36 INFO Client:54 - Application report for application_1534690018301_0035 (state: ACCEPTED)
2018-08-20 05:47:37 INFO Client:54 - Application report for application_1534690018301_0035 (state: FAILED)
2018-08-20 05:47:37 INFO Client:54 -
client token: N/A
diagnostics: Application application_1534690018301_0035 failed 2 times due to AM Container for appattempt_1534690018301_0035_000002 exited with exitCode: 13
Failing this attempt.Diagnostics: [2018-08-20 05:47:36.454]Exception from container-launch.
Container id: container_1534690018301_0035_02_000001
Exit code: 13
My code :
val sparkConf = new SparkConf().setAppName("Gathering Data")
val sc = new SparkContext(sparkConf)
submit command :
spark-submit --class spark_basic.Test_Local --master yarn --deploy-mode cluster /home/IdeaProjects/target/Spark-1.0-SNAPSHOT.jar
discription :
I have installed spark on hadoop in psedo distribustion mode.
spark-shell working fine. only problem when i used cluster mode .
My code also work file . i am able print output but at final its giving error .
I presume your lines of code has a line which sets master to local.
SparkConf.setMaster("local[*]")
if so, try to comment out that line and try again as you will be setting the master to yarn in your command
/usr/cdh/current/spark-client/bin/spark-submit --class com.test.sparkApp --master yarn --deploy-mode cluster --num-executors 40 --executor-cores 4 --driver-memory 17g --executor-memory 22g --files /usr/cdh/current/spark-client/conf/hive-site.xml /home/user/sparkApp.jar
Finally i got with
spark-submit
/home/mahendra/Marvaland/SparkEcho/spark-2.3.0-bin-hadoop2.7/bin/spark-submit --master yarn --class spark_basic.Test_Local /home/mahendra/IdeaProjects/SparkTraining/target/SparkTraining-1.0-SNAPSHOT.jar
spark session
val spark = SparkSession.builder()
.appName("DataETL")
.master("local[1]")
.enableHiveSupport()
.getOrCreate()
thanks #cricket_007
This error may occur if you are submitting the spark job like this:
spark-submit --class some.path.com.Main --master yarn --deploy-mode cluster some_spark.jar (with passing master and deploy-mode as argument in CLI) and at the same time having this line: new SparkContext in your code.
Either get the context with val sc = SparkContext.getOrCreate() or do not pass the spark-submit master and deploy-mode arguments if want to have new SparkContext.

Why does Spark on YARN in cluster mode fail with "Exception in thread "Driver" java.lang.NullPointerException"?

I'm using emr-5.4.0 with Spark 2.1.0. I understand what NullPointerException is, this question is about why that was thrown in this particular case.
Cannot really figure out why I got NullPointerException in the driver thread.
I got this weird job failing with this error:
18/03/29 20:07:52 INFO ApplicationMaster: Starting the user application in a separate Thread
18/03/29 20:07:52 INFO ApplicationMaster: Waiting for spark context initialization...
Exception in thread "Driver" java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
18/03/29 20:07:52 ERROR ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: SparkContext is null but app is still running!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:415)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:766)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:764)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/03/29 20:07:52 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.IllegalStateException: SparkContext is null but app is still running!)
18/03/29 20:07:52 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: java.lang.IllegalStateException: SparkContext is null but app is still running!)
18/03/29 20:07:52 INFO ApplicationMaster: Deleting staging directory hdfs://<ip-address>.ec2.internal:8020/user/hadoop/.sparkStaging/application_1522348295743_0010
18/03/29 20:07:52 INFO ShutdownHookManager: Shutdown hook called
End of LogType:stderr
I submitted this job as this:
spark-submit --deploy-mode cluster --master yarn --num-executors 40 --executor-cores 16 --executor-memory 100g --driver-cores 8 --driver-memory 100g --class <package.class_name> --jars <s3://s3_path/some_lib.jar> <s3://s3_path/class.jar>
And my class looks like this:
class MyClass {
def main(args: Array[String]): Unit = {
val c = new MyClass()
c.process()
}
def process(): Unit = {
val sparkConf = new SparkConf().setAppName("my-test")
val sparkSession: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
import sparkSession.implicits._
....
}
...
}
Change class MyClass to object MyClass and you're done.
While we're at it, I'd also change class MyClass to object MyClass extends App and remove def main(args: Array[String]): Unit (as given by extends App).
I've reported an improvement for Spark 2.3.0 - [SPARK-23830] Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object - to have it reported nicely to an end user.
Digging deeper into how Spark on YARN works, the following message is when the ApplicationMaster of a Spark application starts the driver (you used --deploy-mode cluster --master yarn with spark-submit).
ApplicationMaster: Starting the user application in a separate Thread
Right after the INFO message you should see another:
ApplicationMaster: Waiting for spark context initialization...
This is part of the driver initialization when the ApplicationMaster runs.
The reason for the exception Exception in thread "Driver" java.lang.NullPointerException is due to the following code:
val mainMethod = userClassLoader.loadClass(args.userClass)
.getMethod("main", classOf[Array[String]])
My understanding is that mainMethod is null at this point so the following line (where mainMethod is null) "triggers" NullPointerException:
mainMethod.invoke(null, userArgs.toArray)
The thread is indeed called Driver (as in Exception in thread "Driver" java.lang.NullPointerException) as set in this line:
userThread.setContextClassLoader(userClassLoader)
userThread.setName("Driver")
userThread.start()
The line numbers differ since I used Spark 2.3.0 to reference the lines while you use emr-5.4.0 with Spark 2.1.0.

Spark-Submit is throwing exception when running in yarn-cluster mode

i have a simple spark app for learning puprose ... this scala program parallelizr the data List and writes the RDD on a file in Hadoop.
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object HelloSpark {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("HelloSPark1").setMaster(args(0))
val sc = new SparkContext(conf)
val i = List(1,4,2,11,23,45,67,8,909,5,1,8,"agarwal",19,11,12,34,8031,"aditya")
val b = sc.parallelize(i,3)
b.saveAsTextFile(args(1))
}
}
i create a jar file and when i run it on my cluster it throws error when i run it as --master YARN and --deploy-mode cluster using following command
spark-submit --class "HelloSpark" --master yarn --deploy-mode cluster sparkappl_2.11-1.0.jar yarn /user
/letsbigdata9356/sparktest/run6
client token: N/A
diagnostics: Application application_1483332319047_3791 failed 2 times due to AM Container for appattempt_1483332319047_3791_000002 e
xited with exitCode: 15
For more detailed output, check application tracking page:http://a.cloudxlab.com:8088/cluster/app/application_1483332319047_3791Then, click on
links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e77_1483332319047_3791_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at org.apache.hadoop.util.Shell.run(Shell.java:487)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1484231621733
final status: FAILED
tracking URL: http://a.cloudxlab.com:8088/cluster/app/application_1483332319047_3791
user: letsbigdata9356
Exception in thread "main" org.apache.spark.SparkException: Application application_1483332319047_3791 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:974)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1020)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/01/12 14:34:10 INFO ShutdownHookManager: Shutdown hook called
but when i run it using following command in yarn-client mode or local mode it works fine
spark-submit --class "HelloSpark" sparkappl_2.11-1.0.jar yarn-client /user/letsbigdata9356/sparktest/run
5
or
spark-submit --class "HelloSpark" sparkappl_2.11-1.0.jar local /user/letsbigdata9356/sparktest/run
7
I am new to spark cloud you please help me resolving and learning about this issue.

Apache spark job failed to deploy on Yarn

I'm trying to deploy a spark job to my Yarn cluster but am getting some exception which I don't really understand why.
Here is the stack trace:
15/07/29 14:07:13 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
Exception in thread "Yarn application state monitor" org.apache.spark.SparkException: Error asking standalone scheduler to shut down executors
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(C oarseGrainedSchedulerBackend.scala:261)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stop(CoarseGrai nedSchedulerBackend.scala:2 66)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSch edulerBackend.scala:158)
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:416)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1411)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1644)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:139)
Caused by: java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java
:1326) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:257)
... 6 more
15/07/29 14:07:13 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
Here is my config:
SparkConf sparkConf = new SparkConf(true).setAppName("SparkQueryApp")
.setMaster("yarn-client")// "yarn-cluster" or "yarn-client"
.set("es.nodes", "10.0.0.207")
.set("es.nodes.discovery", "false")
.set("es.cluster", "wp-es-reporting-prod")
.set("es.scroll.size", "5000")
.setJars(JavaSparkContext.jarOfClass(Demo.class))
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.logConf", "true");
Any idea why ?

Resources