Spark cannot find given python environment - apache-spark

I'm trying to submit a spark application to a remote hadoop cluster. I'm following article here.
Here how I submit job:
PYSPARK_PYTHON=./PY_ENV/prod_env3/bin/python /home/hadoop/spark-1.6.0-bin-hadoop2.6/bin/spark-submit \
--master yarn \
--name run.py \
--deploy-mode cluster \
--executor-memory 2g \
--executor-cores 1 \
--num-executors 3 \
--jars /home/hadoop/projects/cms_counter/spark-streaming-kafka-assembly_2.10-1.6.0.jar \
--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./PY_ENV/prod_env3/bin/python \
--archives /opt/anaconda/envs/prod_env3.zip#PY_ENV \
/home/hadoop/run.py
Here I can see that environments, scripts, jars uploaded to .sparkStaging/application_1490199711887_0131 on hdfs:
17/03/27 11:55:42 INFO ConfiguredRMFailoverProxyProvider: Failing over to rm188
17/03/27 11:55:42 INFO Client: Requesting a new application from cluster with 3 NodeManagers
17/03/27 11:55:42 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (52586 MB per container)
17/03/27 11:55:42 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
17/03/27 11:55:42 INFO Client: Setting up container launch context for our AM
17/03/27 11:55:42 INFO Client: Setting up the launch environment for our AM container
17/03/27 11:55:42 INFO Client: Preparing resources for our AM container
17/03/27 11:55:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/27 11:55:43 INFO Client: Uploading resource file:/home/hadoop/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/spark-assembly-1.6.0-hadoop2.6.0.jar
17/03/27 11:55:45 INFO Client: Uploading resource file:/home/hadoop/projects/cms_counter/spark-streaming-kafka-assembly_2.10-1.6.0.jar -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/spark-streaming-kafka-assembly_2.10-1.6.0.jar
17/03/27 11:55:45 INFO Client: Uploading resource file:/opt/anaconda/envs/prod_env3.zip#PY_ENV -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/prod_env3.zip
17/03/27 11:55:46 INFO Client: Uploading resource file:/home/hadoop/run.py -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/run.py
17/03/27 11:55:46 INFO Client: Uploading resource file:/home/hadoop/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/pyspark.zip
17/03/27 11:55:46 INFO Client: Uploading resource file:/home/hadoop/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/py4j-0.9-src.zip
17/03/27 11:55:46 INFO Client: Uploading resource file:/tmp/spark-7c8130fc-454f-4920-95ce-30211cea3576/__spark_conf__8359653165366110281.zip -> hdfs://nameservicehighavail/user/root/.sparkStaging/application_1490199711887_0131/__spark_conf__8359653165366110281.zip
17/03/27 11:55:46 INFO SecurityManager: Changing view acls to: root
17/03/27 11:55:46 INFO SecurityManager: Changing modify acls to: root
17/03/27 11:55:46 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/03/27 11:55:46 INFO Client: Submitting application 131 to ResourceManager
17/03/27 11:55:46 INFO YarnClientImpl: Submitted application application_1490199711887_0131
17/03/27 11:55:47 INFO Client: Application report for application_1490199711887_0131 (state: ACCEPTED)
And I can confirm that files exists here:
[root#d83 ~]# hadoop fs -ls .sparkStaging/application_1490199711887_0131
Found 7 items
-rw-r--r-- 3 root supergroup 24766 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/__spark_conf__8359653165366110281.zip
-rw-r--r-- 3 root supergroup 36034763 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/prod_env3.zip
-rw-r--r-- 3 root supergroup 44846 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/py4j-0.9-src.zip
-rw-r--r-- 3 root supergroup 355358 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/pyspark.zip
-rw-r--r-- 3 root supergroup 2099 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/run.py
-rw-r--r-- 3 root supergroup 187548272 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/spark-assembly-1.6.0-hadoop2.6.0.jar
-rw-r--r-- 3 root supergroup 13350134 2017-03-27 11:54 .sparkStaging/application_1490199711887_0131/spark-streaming-kafka-assembly_2.10-1.6.0.jar
Yet still spark is telling me that it cannot find given python environment.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/sde/yarn/nm/usercache/root/filecache/1454/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/03/27 11:54:10 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
17/03/27 11:54:11 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1490199711887_0131_000001
17/03/27 11:54:11 INFO SecurityManager: Changing view acls to: yarn,root
17/03/27 11:54:11 INFO SecurityManager: Changing modify acls to: yarn,root
17/03/27 11:54:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, root); users with modify permissions: Set(yarn, root)
17/03/27 11:54:12 INFO ApplicationMaster: Starting the user application in a separate Thread
17/03/27 11:54:12 INFO ApplicationMaster: Waiting for spark context initialization
17/03/27 11:54:12 INFO ApplicationMaster: Waiting for spark context initialization ...
17/03/27 11:54:12 ERROR ApplicationMaster: User class threw exception: java.io.IOException: Cannot run program "./PY_ENV/prod_env3/bin/python": error=2, No such file or directory
java.io.IOException: Cannot run program "./PY_ENV/prod_env3/bin/python": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:82)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 7 more
17/03/27 11:54:12 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.io.IOException: Cannot run program "./PY_ENV/prod_env3/bin/python": error=2, No such file or directory)
17/03/27 11:54:22 ERROR ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
17/03/27 11:54:22 INFO ShutdownHookManager: Shutdown hook called
Clearly I must be missing something, any lead would be appreciated.

Turns out I zipped wrong python environment (zip -r prod_env3.zip prod_env). Sorry for inconvenience.

Related

Spark on Yarn error : Yarn application has already ended! It might have been killed or unable to launch application master

While starting spark-shell --master yarn --deploy-mode client I am getting error :
Yarn application has already ended! It might have been killed or
unable to launch application master.
Here is the complete log from Yarn:
19/08/28 00:54:55 INFO client.RMProxy: Connecting to ResourceManager
at /0.0.0.0:8032
Container: container_1566921956926_0010_01_000001 on
rhel7-cloudera-dev_33917
=============================================================================== LogType:stderr Log Upload Time:28-Aug-2019 00:46:31 LogLength:523 Log
Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J:
Found binding in
[jar:file:/yarn/local/usercache/rhel/filecache/26/__spark_libs__5634501618166443611.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/etc/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory]
LogType:stdout Log Upload Time:28-Aug-2019 00:46:31 LogLength:5597 Log
Contents: 2019-08-28 00:46:19 INFO SignalUtils:54 - Registered signal
handler for TERM 2019-08-28 00:46:19 INFO SignalUtils:54 - Registered
signal handler for HUP 2019-08-28 00:46:19 INFO SignalUtils:54 -
Registered signal handler for INT 2019-08-28 00:46:19 INFO
SecurityManager:54 - Changing view acls to: yarn,rhel 2019-08-28
00:46:19 INFO SecurityManager:54 - Changing modify acls to: yarn,rhel
2019-08-28 00:46:19 INFO SecurityManager:54 - Changing view acls
groups to: 2019-08-28 00:46:19 INFO SecurityManager:54 - Changing
modify acls groups to: 2019-08-28 00:46:19 INFO SecurityManager:54 -
SecurityManager: authentication disabled; ui acls disabled; users
with view permissions: Set(yarn, rhel); groups with view permissions:
Set(); users with modify permissions: Set(yarn, rhel); groups with
modify permissions: Set() 2019-08-28 00:46:20 INFO
ApplicationMaster:54 - Preparing Local resources 2019-08-28 00:46:21
INFO ApplicationMaster:54 - ApplicationAttemptId:
appattempt_1566921956926_0010_000001 2019-08-28 00:46:21 INFO
ApplicationMaster:54 - Waiting for Spark driver to be reachable.
2019-08-28 00:46:21 INFO ApplicationMaster:54 - Driver now available:
rhel7-cloudera-dev:34872 2019-08-28 00:46:21 INFO
TransportClientFactory:267 - Successfully created connection to
rhel7-cloudera-dev/192.168.56.112:34872 after 107 ms (0 ms spent in
bootstraps) 2019-08-28 00:46:22 INFO ApplicationMaster:54 -
=============================================================================== YARN executor launch context: env:
CLASSPATH -> {{PWD}}{{PWD}}/spark_conf{{PWD}}/spark_libs/$HADOOP_CONF_DIR$HADOOP_COMMON_HOME/share/hadoop/common/$HADOOP_COMMON_HOME/share/hadoop/common/lib/$HADOOP_HDFS_HOME/share/hadoop/hdfs/$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/$HADOOP_YARN_HOME/share/hadoop/yarn/$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
$HADOOP_COMMON_HOME/$HADOOP_COMMON_HOME/lib/$HADOOP_HDFS_HOME/$HADOOP_HDFS_HOME/lib/$HADOOP_MAPRED_HOME/$HADOOP_MAPRED_HOME/lib/$HADOOP_YARN_HOME/$HADOOP_YARN_HOME/lib/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib//etc/hadoop-2.6.0/etc/hadoop:/etc/hadoop-2.6.0/share/hadoop/common/lib/:/etc/hadoop-2.6.0/share/hadoop/common/:/etc/hadoop-2.6.0/share/hadoop/hdfs:/etc/hadoop-2.6.0/share/hadoop/hdfs/lib/:/etc/hadoop-2.6.0/share/hadoop/hdfs/:/etc/hadoop-2.6.0/share/hadoop/yarn/lib/:/etc/hadoop-2.6.0/share/hadoop/yarn/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/lib/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/:/etc/hadoop-2.6.0/contrib/capacity-scheduler/.jar{{PWD}}/spark_conf/hadoop_conf
SPARK_DIST_CLASSPATH -> /etc/hadoop-2.6.0/etc/hadoop:/etc/hadoop-2.6.0/share/hadoop/common/lib/:/etc/hadoop-2.6.0/share/hadoop/common/:/etc/hadoop-2.6.0/share/hadoop/hdfs:/etc/hadoop-2.6.0/share/hadoop/hdfs/lib/:/etc/hadoop-2.6.0/share/hadoop/hdfs/:/etc/hadoop-2.6.0/share/hadoop/yarn/lib/:/etc/hadoop-2.6.0/share/hadoop/yarn/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/lib/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/:/etc/hadoop-2.6.0/contrib/capacity-scheduler/.jar
SPARK_YARN_STAGING_DIR -> *********(redacted)
SPARK_USER -> *********(redacted)
SPARK_CONF_DIR -> /etc/spark/conf
SPARK_HOME -> /etc/spark
command:
{{JAVA_HOME}}/bin/java \
-server \
-Xmx1024m \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.driver.port=34872' \
-Dspark.yarn.app.container.log.dir= \
-XX:OnOutOfMemoryError='kill %p' \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler#rhel7-cloudera-dev:34872 \
--executor-id \
\
--hostname \
\
--cores \
1 \
--app-id \
application_1566921956926_0010 \
--user-class-path \
file:$PWD/app.jar \
1>/stdout \
2>/stderr
resources:
spark_libs -> resource { scheme: "hdfs" host: "rhel7-cloudera-dev" port: 9000 file:
"/user/rhel/.sparkStaging/application_1566921956926_0010/spark_libs__5634501618166443611.zip"
} size: 232107209 timestamp: 1566933362350 type: ARCHIVE visibility:
PRIVATE
__spark_conf -> resource { scheme: "hdfs" host: "rhel7-cloudera-dev" port: 9000 file:
"/user/rhel/.sparkStaging/application_1566921956926_0010/spark_conf.zip"
} size: 208377 timestamp: 1566933365411 type: ARCHIVE visibility:
PRIVATE
=============================================================================== 2019-08-28 00:46:22 INFO RMProxy:98 - Connecting to ResourceManager
at /0.0.0.0:8030 2019-08-28 00:46:22 INFO YarnRMClient:54 -
Registering the ApplicationMaster 2019-08-28 00:46:22 INFO
YarnAllocator:54 - Will request 2 executor container(s), each with 1
core(s) and 1408 MB memory (including 384 MB of overhead) 2019-08-28
00:46:22 INFO YarnAllocator:54 - Submitted 2 unlocalized container
requests. 2019-08-28 00:46:22 INFO ApplicationMaster:54 - Started
progress reporter thread with (heartbeat : 3000, initial allocation :
200) intervals 2019-08-28 00:46:22 ERROR ApplicationMaster:43 -
RECEIVED SIGNAL TERM 2019-08-28 00:46:23 INFO ApplicationMaster:54 -
Final app status: UNDEFINED, exitCode: 16, (reason: Shutdown hook
called before final status was reported.) 2019-08-28 00:46:23 INFO
ShutdownHookManager:54 - Shutdown hook called
Container: container_1566921956926_0010_02_000001 on
rhel7-cloudera-dev_33917
=============================================================================== LogType:stderr Log Upload Time:28-Aug-2019 00:46:31 LogLength:3576 Log
Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J:
Found binding in
[jar:file:/yarn/local/usercache/rhel/filecache/26/__spark_libs__5634501618166443611.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/etc/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory] Exception in thread "main"
java.io.IOException: Failed on local exception: java.io.IOException;
Host Details : local host is: "rhel7-cloudera-dev/192.168.56.112";
destination host is: "rhel7-cloudera-dev":9000; at
org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at
org.apache.hadoop.ipc.Client.call(Client.java:1474) at
org.apache.hadoop.ipc.Client.call(Client.java:1401) at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1977) at
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
at
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7$$anonfun$apply$3.apply(ApplicationMaster.scala:235)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7$$anonfun$apply$3.apply(ApplicationMaster.scala:232)
at scala.Option.foreach(Option.scala:257) at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7.apply(ApplicationMaster.scala:232)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7.apply(ApplicationMaster.scala:197)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:800)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:799)
at
org.apache.spark.deploy.yarn.ApplicationMaster.(ApplicationMaster.scala:197)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:823)
at
org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:854)
at
org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.io.IOException at
org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:935)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)
Caused by: java.lang.InterruptedException ... 2 more
LogType:stdout Log Upload Time:28-Aug-2019 00:46:31 LogLength:975 Log
Contents: 2019-08-28 00:46:26 INFO SignalUtils:54 - Registered signal
handler for TERM 2019-08-28 00:46:26 INFO SignalUtils:54 - Registered
signal handler for HUP 2019-08-28 00:46:26 INFO SignalUtils:54 -
Registered signal handler for INT 2019-08-28 00:46:27 INFO
SecurityManager:54 - Changing view acls to: yarn,rhel 2019-08-28
00:46:27 INFO SecurityManager:54 - Changing modify acls to: yarn,rhel
2019-08-28 00:46:27 INFO SecurityManager:54 - Changing view acls
groups to: 2019-08-28 00:46:27 INFO SecurityManager:54 - Changing
modify acls groups to: 2019-08-28 00:46:27 INFO SecurityManager:54 -
SecurityManager: authentication disabled; ui acls disabled; users
with view permissions: Set(yarn, rhel); groups with view permissions:
Set(); users with modify permissions: Set(yarn, rhel); groups with
modify permissions: Set() 2019-08-28 00:46:28 INFO
ApplicationMaster:54 - Preparing Local resources 2019-08-28 00:46:28
ERROR ApplicationMaster:43 - RECEIVED SIGNAL TERM
Any suggestion to resolve this issue?

spark-submit: unable to get driver status

I'm running a job on a test Spark standalone in cluster mode, but I'm finding myself unable to monitor the status of the driver.
Here is a minimal example using spark-2.4.3 (master and one worker running on the same node, started running sbin/start-all.sh on a freshly unarchived installation using the default conf, no conf/slaves set), executing spark-submit from the node itself:
$ spark-submit --master spark://ip-172-31-15-245:7077 --deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
/home/ubuntu/spark/examples/jars/spark-examples_2.11-2.4.3.jar 100
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/06/27 09:08:28 INFO SecurityManager: Changing view acls to: ubuntu
19/06/27 09:08:28 INFO SecurityManager: Changing modify acls to: ubuntu
19/06/27 09:08:28 INFO SecurityManager: Changing view acls groups to:
19/06/27 09:08:28 INFO SecurityManager: Changing modify acls groups to:
19/06/27 09:08:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); groups with view permissions: Set(); users with modify permissions: Set(ubuntu); groups with modify permissions: Set()
19/06/27 09:08:28 INFO Utils: Successfully started service 'driverClient' on port 36067.
19/06/27 09:08:28 INFO TransportClientFactory: Successfully created connection to ip-172-31-15-245/172.31.15.245:7077 after 29 ms (0 ms spent in bootstraps)
19/06/27 09:08:28 INFO ClientEndpoint: Driver successfully submitted as driver-20190627090828-0008
19/06/27 09:08:28 INFO ClientEndpoint: ... waiting before polling master for driver state
19/06/27 09:08:33 INFO ClientEndpoint: ... polling master for driver state
19/06/27 09:08:33 INFO ClientEndpoint: State of driver-20190627090828-0008 is RUNNING
19/06/27 09:08:33 INFO ClientEndpoint: Driver running on 172.31.15.245:41057 (worker-20190627083412-172.31.15.245-41057)
19/06/27 09:08:33 INFO ShutdownHookManager: Shutdown hook called
19/06/27 09:08:33 INFO ShutdownHookManager: Deleting directory /tmp/spark-34082661-f0de-4c56-92b7-648ea24fa59c
> spark-submit --master spark://ip-172-31-15-245:7077 --status driver-20190627090828-0008
19/06/27 09:09:27 WARN RestSubmissionClient: Unable to connect to server spark://ip-172-31-15-245:7077.
Exception in thread "main" org.apache.spark.deploy.rest.SubmitRestConnectionException: Unable to connect to server
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:165)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:148)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.deploy.rest.RestSubmissionClient.requestSubmissionStatus(RestSubmissionClient.scala:148)
at org.apache.spark.deploy.SparkSubmit.requestStatus(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:88)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.deploy.rest.SubmitRestConnectionException: No response from server
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:285)
at org.apache.spark.deploy.rest.RestSubmissionClient.org$apache$spark$deploy$rest$RestSubmissionClient$$get(RestSubmissionClient.scala:195)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:152)
... 11 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:278)
... 13 more
Spark is in good health (I'm able to run other jobs after the one above), the driver-20190627090828-0008 appears as "FINISHED" in the web UI.
Is there something I am missing?
UPDATE:
on the master log all I get is
19/07/01 09:40:24 INFO master.Master: 172.31.15.245:42308 got disassociated, removing it.

Property spark.yarn.jars - how to deal with it?

My knowledge with Spark is limited and you would sense it after reading this question. I have just one node and spark, hadoop and yarn are installed on it.
I was able to code and run word-count problem in cluster mode by below command
spark-submit --class com.sanjeevd.sparksimple.wordcount.JobRunner
--master yarn
--deploy-mode cluster
--driver-memory=2g
--executor-memory 2g
--executor-cores 1
--num-executors 1
SparkSimple-0.0.1SNAPSHOT.jar
hdfs://sanjeevd.br:9000/user/spark-test/word-count/input
hdfs://sanjeevd.br:9000/user/spark-test/word-count/output
It works just fine.
Now I understood that 'spark on yarn' requires spark jar files available on the cluster and if I don't do anything then every time I run my program it will copy hundreds of jar files from $SPARK_HOME to each node (in my case it's just one node). I see that code's execution pauses for some time before it finishes copying. See below -
16/12/12 17:24:03 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/12/12 17:24:06 INFO yarn.Client: Uploading resource file:/tmp/spark-a6cc0d6e-45f9-4712-8bac-fb363d6992f2/__spark_libs__11112433502351931.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/__spark_libs__11112433502351931.zip
16/12/12 17:24:08 INFO yarn.Client: Uploading resource file:/home/sanjeevd/personal/Spark-Simple/target/SparkSimple-0.0.1-SNAPSHOT.jar -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/SparkSimple-0.0.1-SNAPSHOT.jar
16/12/12 17:24:08 INFO yarn.Client: Uploading resource file:/tmp/spark-a6cc0d6e-45f9-4712-8bac-fb363d6992f2/__spark_conf__6716604236006329155.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/__spark_conf__.zip
Spark's documentation suggests to set spark.yarn.jars property to avoid this copying. So I set below below property in spark-defaults.conf file.
spark.yarn.jars hdfs://sanjeevd.br:9000//user/spark/share/lib
http://spark.apache.org/docs/latest/running-on-yarn.html#preparations
To make Spark runtime jars accessible from YARN side, you can specify spark.yarn.archive or spark.yarn.jars. For details please refer to Spark Properties. If neither spark.yarn.archive nor spark.yarn.jars is specified, Spark will create a zip file with all jars under $SPARK_HOME/jars and upload it to the distributed cache.
Btw, I have all the jar files from LOCAL /opt/spark/jars to HDFS /user/spark/share/lib. They are 206 in number.
This makes my jar failed. Below is the error -
spark-submit --class com.sanjeevd.sparksimple.wordcount.JobRunner --master yarn --deploy-mode cluster --driver-memory=2g --executor-memory 2g --executor-cores 1 --num-executors 1 SparkSimple-0.0.1-SNAPSHOT.jar hdfs://sanjeevd.br:9000/user/spark-test/word-count/input hdfs://sanjeevd.br:9000/user/spark-test/word-count/output
16/12/12 17:43:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/12 17:43:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/12/12 17:43:07 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/12/12 17:43:07 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (5120 MB per container)
16/12/12 17:43:07 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
16/12/12 17:43:07 INFO yarn.Client: Setting up container launch context for our AM
16/12/12 17:43:07 INFO yarn.Client: Setting up the launch environment for our AM container
16/12/12 17:43:07 INFO yarn.Client: Preparing resources for our AM container
16/12/12 17:43:07 INFO yarn.Client: Uploading resource file:/home/sanjeevd/personal/Spark-Simple/target/SparkSimple-0.0.1-SNAPSHOT.jar -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005/SparkSimple-0.0.1-SNAPSHOT.jar
16/12/12 17:43:07 INFO yarn.Client: Uploading resource file:/tmp/spark-fae6a5ad-65d9-4b64-9ba6-65da1310ae9f/__spark_conf__7881471844385719101.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005/__spark_conf__.zip
16/12/12 17:43:08 INFO spark.SecurityManager: Changing view acls to: sanjeevd
16/12/12 17:43:08 INFO spark.SecurityManager: Changing modify acls to: sanjeevd
16/12/12 17:43:08 INFO spark.SecurityManager: Changing view acls groups to:
16/12/12 17:43:08 INFO spark.SecurityManager: Changing modify acls groups to:
16/12/12 17:43:08 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sanjeevd); groups with view permissions: Set(); users with modify permissions: Set(sanjeevd); groups with modify permissions: Set()
16/12/12 17:43:08 INFO yarn.Client: Submitting application application_1481592214176_0005 to ResourceManager
16/12/12 17:43:08 INFO impl.YarnClientImpl: Submitted application application_1481592214176_0005
16/12/12 17:43:09 INFO yarn.Client: Application report for application_1481592214176_0005 (state: ACCEPTED)
16/12/12 17:43:09 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1481593388442
final status: UNDEFINED
tracking URL: http://sanjeevd.br:8088/proxy/application_1481592214176_0005/
user: sanjeevd
16/12/12 17:43:10 INFO yarn.Client: Application report for application_1481592214176_0005 (state: FAILED)
16/12/12 17:43:10 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1481592214176_0005 failed 1 times due to AM Container for appattempt_1481592214176_0005_000001 exited with exitCode: 1
For more detailed output, check application tracking page:http://sanjeevd.br:8088/cluster/app/application_1481592214176_0005Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1481592214176_0005_01_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1481593388442
final status: FAILED
tracking URL: http://sanjeevd.br:8088/cluster/app/application_1481592214176_0005
user: sanjeevd
16/12/12 17:43:10 INFO yarn.Client: Deleting staging directory hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005
Exception in thread "main" org.apache.spark.SparkException: Application application_1481592214176_0005 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1175)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/12/12 17:43:10 INFO util.ShutdownHookManager: Shutdown hook called
16/12/12 17:43:10 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-fae6a5ad-65d9-4b64-9ba6-65da1310ae9f
Do you know what wrong am I doing? The task's log says below -
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I understand the error that ApplicationMaster class is not found but my question is why it is not found - where this class is supposed to be? I don't have assembly jar since I'm using spark 2.0.1 where there is no assembly comes bundled.
What this has to do with spark.yarn.jars property? This property is to help spark run on yarn, and that should be it. What additional I need to do when using spark.yarn.jars?
Thanks in reading this question and for your help in advance.
You could also use the spark.yarn.archive option and set that to the location of an archive (you create) containing all the JARs in the $SPARK_HOME/jars/ folder, at the root level of the archive. For example:
Create the archive: jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ .
Upload to HDFS: hdfs dfs -put spark-libs.jar /some/path/.
2a. For a large cluster, increase the replication count of the Spark archive so that you reduce the amount of times a NodeManager will do a remote copy. hdfs dfs –setrep -w 10 hdfs:///some/path/spark-libs.jar (Change the amount of replicas proportional to the number of total NodeManagers)
Set spark.yarn.archive to hdfs:///some/path/spark-libs.jar
I was finally able to make sense of this property. I found by hit-n-trial that correct syntax of this property is
spark.yarn.jars=hdfs://xx:9000/user/spark/share/lib/*.jar
I didn't put *.jar in the end and my path was just ended with /lib. I tried putting actual assembly jar like this - spark.yarn.jars=hdfs://sanjeevd.brickred:9000/user/spark/share/lib/spark-yarn_2.11-2.0.1.jar but no luck. All it said that unable to load ApplicationMaster.
I posted my response to a similar question asked by someone at https://stackoverflow.com/a/41179608/2332121
If you look at spark.yarn.jars documentation it says the following
List of libraries containing Spark code to distribute to YARN containers. By default, Spark on YARN will use Spark jars installed locally, but the Spark jars can also be in a world-readable location on HDFS. This allows YARN to cache it on nodes so that it doesn't need to be distributed each time an application runs. To point to jars on HDFS, for example, set this configuration to hdfs:///some/path. Globs are allowed.
This means that you are actually overriding the SPARK_HOME/jars and telling yarn to pick up all the jars required for the application run from your path,If you set spark.yarn.jars property, all the dependent jars for spark to run should be present in this path, If you go and look inside spark-assembly.jar present in SPARK_HOME/lib , org.apache.spark.deploy.yarn.ApplicationMaster class is present, so make sure that all the spark dependencies are present in the HDFS path that you specify as spark.yarn.jars.

EMR 5.0 + Spark getting stack at endless loop

i am trying to deploy Spark 2.0 Streaming over Amazon EMR 5.0.
it seems that the application is getting stuck at endless loop with the log
"endless loop of "INFO Client: Application report for application_14111979683_1111 (state: ACCEPTED)"
and then exit.
Here is how i am trying to submit it through the command line:
aws emr add-steps --cluster-id --steps
Type=Spark,Name="Spark
Program",ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--class,,s3://.jar]
any idea ?
thanks,
Eran
16/08/30 15:43:27 INFO SecurityManager: Changing view acls to: hadoop
16/08/30 15:43:27 INFO SecurityManager: Changing modify acls to: hadoop
16/08/30 15:43:27 INFO SecurityManager: Changing view acls groups to:
16/08/30 15:43:27 INFO SecurityManager: Changing modify acls groups to:
16/08/30 15:43:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
16/08/30 15:43:27 INFO Client: Submitting application application_14111979683_1111 to ResourceManager
16/08/30 15:43:27 INFO YarnClientImpl: Submitted application application_14111979683_1111
16/08/30 15:43:28 INFO Client: Application report for application_14111979683_1111 (state: ACCEPTED)
16/08/30 15:43:28 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1472571807467
final status: UNDEFINED
tracking URL: http://xxxxxx:20888/proxy/application_14111979683_1111/
user: hadoop
16/08/30 15:43:29 INFO Client: Application report for application_14111979683_1111 (state: ACCEPTED)
and this the exception thrown:
16/08/31 08:14:48 INFO Client:
client token: N/A
diagnostics: Application application_1472630652740_0001 failed 2 times due to AM Container for appattempt_1472630652740_0001_000002 exited with exitCode: 13
For more detailed output, check application tracking page:http://ip-10-0-0-8.eu-west-1.compute.internal:8088/cluster/app/application_1472630652740_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1472630652740_0001_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
EMR is actually a wrapper to Yarn.
so, we need to add "--master yarn" as an argument to the deployment command line.
Example:
aws emr add-steps --cluster-id j-XXXXXXXXX --steps Type=Spark,Name="Spark Program",ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--class,com.xxx.MyMainClass,s3://]
Another thing which is needed, is removing 'sparkConf.setMaster("local[*]")',
from the initialization of spark conf.

spark-submit yarn-client run failed

Using the yarn-client to run spark program.
I've build the spark on yarn environment.
the scripts is
./bin/spark-submit --class WordCountTest \
--master yarn-client \
--num-executors 1 \
--executor-cores 1 \
--queue root.hadoop \
/root/Desktop/test2.jar \
10
when running I get the following exception.
15/05/12 17:42:01 INFO spark.SparkContext: Running Spark version 1.3.1
15/05/12 17:42:01 WARN spark.SparkConf:
SPARK_CLASSPATH was detected (set to ':/usr/local/hadoop/hadoop-2.5.2/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath
15/05/12 17:42:01 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to ':/usr/local/hadoop/hadoop-2.5.2/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/05/12 17:42:01 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to ':/usr/local/hadoop/hadoop-2.5.2/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/05/12 17:42:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/05/12 17:42:02 INFO spark.SecurityManager: Changing view acls to: root
15/05/12 17:42:02 INFO spark.SecurityManager: Changing modify acls to: root
15/05/12 17:42:02 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/05/12 17:42:02 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/05/12 17:42:02 INFO Remoting: Starting remoting
15/05/12 17:42:03 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#master:49338]
15/05/12 17:42:03 INFO util.Utils: Successfully started service 'sparkDriver' on port 49338.
15/05/12 17:42:03 INFO spark.SparkEnv: Registering MapOutputTracker
15/05/12 17:42:03 INFO spark.SparkEnv: Registering BlockManagerMaster
15/05/12 17:42:03 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-57f5fb29-784d-4730-92b8-c2e8be97c038/blockmgr-752988bc-b2d0-42f7-891d-5d3edbb4526d
15/05/12 17:42:03 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
15/05/12 17:42:04 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-2f2a46eb-9259-4c6e-b9af-7159efb0b3e9/httpd-3c50fe1e-430e-4077-9cd0-58246e182d98
15/05/12 17:42:04 INFO spark.HttpServer: Starting HTTP Server
15/05/12 17:42:04 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/05/12 17:42:04 INFO server.AbstractConnector: Started SocketConnector#0.0.0.0:41749
15/05/12 17:42:04 INFO util.Utils: Successfully started service 'HTTP file server' on port 41749.
15/05/12 17:42:04 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/05/12 17:42:05 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/05/12 17:42:05 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
15/05/12 17:42:05 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/05/12 17:42:05 INFO ui.SparkUI: Started SparkUI at http://master:4040
15/05/12 17:42:05 INFO spark.SparkContext: Added JAR file:/root/Desktop/test2.jar at http://192.168.147.201:41749/jars/test2.jar with timestamp 1431423725289
15/05/12 17:42:05 WARN cluster.YarnClientSchedulerBackend: NOTE: SPARK_WORKER_MEMORY is deprecated. Use SPARK_EXECUTOR_MEMORY or --executor-memory through spark-submit instead.
15/05/12 17:42:06 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.147.201:8032
15/05/12 17:42:06 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
15/05/12 17:42:06 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/05/12 17:42:06 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/05/12 17:42:06 INFO yarn.Client: Setting up container launch context for our AM
15/05/12 17:42:06 INFO yarn.Client: Preparing resources for our AM container
15/05/12 17:42:07 WARN yarn.Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/05/12 17:42:07 INFO yarn.Client: Uploading resource file:/usr/local/spark/spark-1.3.1-bin-hadoop2.5.0-cdh5.3.2/lib/spark-assembly-1.3.1-hadoop2.5.0-cdh5.3.2.jar -> hdfs://master:9000/user/root/.sparkStaging/application_1431423592173_0003/spark-assembly-1.3.1-hadoop2.5.0-cdh5.3.2.jar
15/05/12 17:42:11 INFO yarn.Client: Setting up the launch environment for our AM container
15/05/12 17:42:11 WARN yarn.Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/05/12 17:42:11 INFO spark.SecurityManager: Changing view acls to: root
15/05/12 17:42:11 INFO spark.SecurityManager: Changing modify acls to: root
15/05/12 17:42:11 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/05/12 17:42:11 INFO yarn.Client: Submitting application 3 to ResourceManager
15/05/12 17:42:11 INFO impl.YarnClientImpl: Submitted application application_1431423592173_0003
15/05/12 17:42:12 INFO yarn.Client: Application report for application_1431423592173_0003 (state: FAILED)
15/05/12 17:42:12 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1431423592173_0003 submitted by user root to unknown queue: root.hadoop
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hadoop
start time: 1431423731271
final status: FAILED
tracking URL: N/A
user: root
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:113)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:381)
at WordCountTest$.main(WordCountTest.scala:14)
at WordCountTest.main(WordCountTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
My code very simple, just as following:
object WordCountTest {
def main (args: Array[String]): Unit = {
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
val sparkConf = new SparkConf().setAppName("WordCountTest Prog")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val file = sc.textFile("/data/test/pom.xml")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
println(counts)
counts.saveAsTextFile("/data/test/pom_count.txt")
}
}
I've debug this problem for 2 days. Help!Help! Thx.
Try changing queue name to hadoop
in my case,
change “--queue thequeue” to “--queue default”
it work
运行:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 --queue thequeue lib/spark-examples*.jar 10
时报如下错误,只需要将“--queue thequeue”改成“--queue default”即可。

Resources