How to run spark-submit in virtualenv for pyspark? - apache-spark

Is there a way to run spark-submit (spark v2.3.2 from HDP 3.1.0) while in a virtualenv? Have situation where have python file that uses python3 (and some specific libs) in a virtualenv (to isolate lib versions from rest of system). I would like to run this file with /bin/spark-submit, but attempting to do so I get...
[me#airflowetl tests]$ source ../venv/bin/activate; /bin/spark-submit sparksubmit.test.py
File "/bin/hdp-select", line 255
print "ERROR: Invalid package - " + name
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("ERROR: Invalid package - " + name)?
ls: cannot access /usr/hdp//hadoop/lib: No such file or directory
Exception in thread "main" java.lang.IllegalStateException: hdp.version is not set while running Spark under HDP, please set through HDP_VERSION in spark-env.sh or add a java-opts file in conf with -Dhdp.version=xxx
at org.apache.spark.launcher.Main.main(Main.java:118)
also tried...
(venv) [me#airflowetl tests]$ export HADOOP_CONF_DIR=/etc/hadoop/conf; spark-submit --master yarn --deploy-mode cluster sparksubmit.test.py
19/12/12 13:50:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/12 13:50:20 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
....
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
...or (from here https://www.hackingnote.com/en/spark/trouble-shooting/NoClassDefFoundError-ClientConfig)...
(venv) [airflow#airflowetl tests]$ spark-submit --master yarn --deploy-mode client --conf spark.hadoop.yarn.timeline-service.enabled=false sparksubmit.test.py
19/12/12 15:22:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/12 15:22:49 INFO spark.SparkContext: Running Spark version 2.4.4
19/12/12 15:22:49 INFO spark.SparkContext: Submitted application: hph_etl_TEST
19/12/12 15:22:49 INFO spark.SecurityManager: Changing view acls to: airflow
19/12/12 15:22:49 INFO spark.SecurityManager: Changing modify acls to: airflow
19/12/12 15:22:49 INFO spark.SecurityManager: Changing view acls groups to:
19/12/12 15:22:49 INFO spark.SecurityManager: Changing modify acls groups to:
19/12/12 15:22:49 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(airflow); groups with view permissions: Set(); users with modify permissions: Set(airflow); groups with modify permissions: Set()
19/12/12 15:22:49 INFO util.Utils: Successfully started service 'sparkDriver' on port 45232.
19/12/12 15:22:50 INFO spark.SparkEnv: Registering MapOutputTracker
19/12/12 15:22:50 INFO spark.SparkEnv: Registering BlockManagerMaster
19/12/12 15:22:50 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/12/12 15:22:50 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/12/12 15:22:50 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-320366b6-609a-497b-ac40-119d11682044
19/12/12 15:22:50 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
19/12/12 15:22:50 INFO spark.SparkEnv: Registering OutputCommitCoordinator
19/12/12 15:22:50 INFO util.log: Logging initialized #2663ms
19/12/12 15:22:50 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
19/12/12 15:22:50 INFO server.Server: Started #2763ms
19/12/12 15:22:50 INFO server.AbstractConnector: Started ServerConnector#50a3c656{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
19/12/12 15:22:50 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#306c15f1{/jobs,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#2b566f8d{/jobs/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#1b5ef515{/jobs/job,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#59f7a5e2{/jobs/job/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#41c58356{/stages,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#2d5f2026{/stages/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#324ca89a{/stages/stage,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#6f487c61{/stages/stage/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#3897116a{/stages/pool,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#68ab090f{/stages/pool/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#42ea3278{/storage,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#6eedf530{/storage/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#6e71a5c6{/storage/rdd,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#5e222a76{/storage/rdd/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4dc8aa38{/environment,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4c8d82c4{/environment/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#2fb15106{/executors,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#608faf1c{/executors/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#689e405f{/executors/threadDump,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#48a5742a{/executors/threadDump/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#6db93559{/static,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4d7ed508{/,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#5510f12d{/api,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#6d87de7{/jobs/job/kill,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#62595660{/stages/stage/kill,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://airflowetl.local:4040
19/12/12 15:22:51 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
19/12/12 15:22:51 INFO client.RMProxy: Connecting to ResourceManager at hw001.local/172.18.4.46:8050
19/12/12 15:22:51 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
19/12/12 15:22:51 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (15360 MB per container)
19/12/12 15:22:51 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
19/12/12 15:22:51 INFO yarn.Client: Setting up container launch context for our AM
19/12/12 15:22:51 INFO yarn.Client: Setting up the launch environment for our AM container
19/12/12 15:22:51 INFO yarn.Client: Preparing resources for our AM container
19/12/12 15:22:51 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/12/12 15:22:53 INFO yarn.Client: Uploading resource file:/tmp/spark-4e600acd-2d34-4271-b01c-25f312906f93/__spark_libs__8368679994314392346.zip -> hdfs://hw001.local:8020/user/airflow/.sparkStaging/application_1572898343646_0029/__spark_libs__8368679994314392346.zip
19/12/12 15:22:54 INFO yarn.Client: Uploading resource file:/home/airflow/projects/hph_etl_airflow/venv/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip -> hdfs://hw001.local:8020/user/airflow/.sparkStaging/application_1572898343646_0029/pyspark.zip
19/12/12 15:22:55 INFO yarn.Client: Uploading resource file:/home/airflow/projects/hph_etl_airflow/venv/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip -> hdfs://hw001.local:8020/user/airflow/.sparkStaging/application_1572898343646_0029/py4j-0.10.7-src.zip
19/12/12 15:22:55 INFO yarn.Client: Uploading resource file:/tmp/spark-4e600acd-2d34-4271-b01c-25f312906f93/__spark_conf__5403285055443058510.zip -> hdfs://hw001.local:8020/user/airflow/.sparkStaging/application_1572898343646_0029/__spark_conf__.zip
19/12/12 15:22:55 INFO spark.SecurityManager: Changing view acls to: airflow
19/12/12 15:22:55 INFO spark.SecurityManager: Changing modify acls to: airflow
19/12/12 15:22:55 INFO spark.SecurityManager: Changing view acls groups to:
19/12/12 15:22:55 INFO spark.SecurityManager: Changing modify acls groups to:
19/12/12 15:22:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(airflow); groups with view permissions: Set(); users with modify permissions: Set(airflow); groups with modify permissions: Set()
19/12/12 15:22:56 INFO yarn.Client: Submitting application application_1572898343646_0029 to ResourceManager
19/12/12 15:22:56 INFO impl.YarnClientImpl: Submitted application application_1572898343646_0029
19/12/12 15:22:56 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1572898343646_0029 and attemptId None
19/12/12 15:22:57 INFO yarn.Client: Application report for application_1572898343646_0029 (state: ACCEPTED)
19/12/12 15:22:57 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1576200176385
final status: UNDEFINED
tracking URL: http://hw001.local:8088/proxy/application_1572898343646_0029/
user: airflow
19/12/12 15:22:58 INFO yarn.Client: Application report for application_1572898343646_0029 (state: FAILED)
19/12/12 15:22:58 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1572898343646_0029 failed 2 times due to AM Container for appattempt_1572898343646_0029_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2019-12-12 15:22:58.214]Exception from container-launch.
Container id: container_e02_1572898343646_0029_02_000001
Exit code: 1
[2019-12-12 15:22:58.215]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/hadoop/yarn/local/usercache/airflow/appcache/application_1572898343646_0029/container_e02_1572898343646_0029_02_000001/launch_container.sh: line 38: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/3.1.0.0-78/hadoop/*:/usr/hdp/3.1.0.0-78/hadoop/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:$PWD/__spark_conf__/__hadoop_conf__: bad substitution
....
Not sure what to make of this or how to proceed further and did not totally understand the error message after googling it.
Anyone with more experience have any further debugging tips for this or fixes?

spark-submit is a bash script, and uses Java classes to run, so using a virtualenv wouldn't necessarily help (although, you can see in the logs that files were uploaded from the environment).
The first error is because hdp-select requires Python2, but it looks like it ran with Python3 (probably due to your venv)
If you want to carry your Python environment to the executors and driver, you'd probably want to use the --pyfiles option instead, or setup the same python environment on each Spark node
Also, you seem to have Spark 2.4.4, not 2.3.2, like you say, which could explain the NoClassDef if you're mixing Spark versions (in particular pyspark from pip doesn't download any scheduler specific packages, like the YARN timeline)
But you ran the code fine and you can find the real exception under
http://hw001.local:8088/proxy/application_1572898343646_0029

Related

Access an HIVE table with pyspark .py file

I get data from an sql table using this code when I run in the pyspark terminal on a GCP machine
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("appName").getOrCreate()
sc = spark.sparkContext
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df= sqlContext.sql('select * from mytable limit 100')
print 'number of rows = ', df.count()
It works when the code is copied and pasted on the pyspark terminal window. But It gives this error when the file is run as .py from terminal.
19/01/21 03:38:43 INFO spark.SparkContext: Running Spark version 2.2.1
19/01/21 03:38:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/01/21 03:38:43 INFO spark.SparkContext: Submitted application: appName
19/01/21 03:38:43 INFO spark.SecurityManager: Changing view acls to: xxxxxxx
19/01/21 03:38:43 INFO spark.SecurityManager: Changing modify acls to: xxxxxxx
19/01/21 03:38:43 INFO spark.SecurityManager: Changing view acls groups to:
19/01/21 03:38:43 INFO spark.SecurityManager: Changing modify acls groups to:
19/01/21 03:38:43 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxxxxxx); groups with view permissions: Set(); users with modify permissions: Set(xxxxxxx); groups with modify permissions: Set()
19/01/21 03:38:44 INFO util.Utils: Successfully started service 'sparkDriver' on port 00000.
19/01/21 03:38:44 INFO spark.SparkEnv: Registering MapOutputTracker
19/01/21 03:38:44 INFO spark.SparkEnv: Registering BlockManagerMaster
19/01/21 03:38:44 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/01/21 03:38:44 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/01/21 03:38:44 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-bdcf00db-e6fc-4a6f-a64d-59def40ca89c
19/01/21 03:38:44 INFO memory.MemoryStore: MemoryStore started with capacity 4.3 GB
19/01/21 03:38:44 INFO spark.SparkEnv: Registering OutputCommitCoordinator
19/01/21 03:38:44 INFO util.log: Logging initialized #3180ms
19/01/21 03:38:44 INFO server.Server: jetty-9.3.z-SNAPSHOT
19/01/21 03:38:44 INFO server.Server: Started #3277ms
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4046. Attempting port 4047.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4047. Attempting port 4048.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4048. Attempting port 4049.
19/01/21 03:38:44 INFO server.AbstractConnector: Started ServerConnector#aaa850a{HTTP/1.1,[http/1.1]}{0.0.0.0:0000}
19/01/21 03:38:44 INFO util.Utils: Successfully started service 'SparkUI' on port 0000.
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/jobs,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/jobs/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/jobs/job,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/jobs/job/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/stage,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/stage/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/pool,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/pool/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/storage,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/storage/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/storage/rdd,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/storage/rdd/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/environment,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/environment/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/executors,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/executors/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/executors/threadDump,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/executors/threadDump/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/static,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/api,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/jobs/job/kill,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/stage/kill,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://00.00.00.00:0000
19/01/21 03:38:44 INFO util.Utils: Using initial executors = 8, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
19/01/21 03:38:44 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.6.10-hadoop2
19/01/21 03:38:45 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
19/01/21 03:38:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1
19/01/21 03:38:46 INFO retry.RetryInvocationHandler: Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over rm1 after 1 fail over attempts. Trying to fail over after sleeping for 829ms.
java.net.ConnectException: Call From mytable/ipaddress to mytable:0000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:206)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:156)
at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:156)
at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:155)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 32 more
19/01/21 03:38:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
19/01/21 03:38:46 INFO yarn.Client: Requesting a new application from cluster with 80 NodeManagers
19/01/21 03:38:46 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (45056 MB per container)
19/01/21 03:38:46 INFO yarn.Client: Will allocate AM container, with 24576 MB memory including 2234 MB overhead
19/01/21 03:38:46 INFO yarn.Client: Setting up container launch context for our AM
19/01/21 03:38:46 INFO yarn.Client: Setting up the launch environment for our AM container
19/01/21 03:38:46 INFO yarn.Client: Preparing resources for our AM container
19/01/21 03:38:48 INFO yarn.Client: Uploading resource file:/opt/hadoop/spark/python/lib/pyspark.zip -> hdfs://name-dataproc/user/xxxxxxx/.sparkStaging/application_1547596846411_1167/pyspark.zip
19/01/21 03:38:48 INFO yarn.Client: Uploading resource file:/opt/hadoop/spark/python/lib/py4j-0.10.4-src.zip -> hdfs://name-dataproc/user/xxxxxxx/.sparkStaging/application_1547596846411_1167/py4j-0.10.4-src.zip
19/01/21 03:38:48 INFO yarn.Client: Uploading resource file:/tmp/spark-1c0d417f-4fd6-411a-9480-0fc147d7c9a8/__spark_conf__2865868052747382300.zip -> hdfs://name-dataproc/user/xxxxxxx/.sparkStaging/application_1547596846411_1167/__spark_conf__.zip
19/01/21 03:38:48 INFO spark.SecurityManager: Changing view acls to: xxxxxxx
19/01/21 03:38:48 INFO spark.SecurityManager: Changing modify acls to: xxxxxxx
19/01/21 03:38:48 INFO spark.SecurityManager: Changing view acls groups to:
19/01/21 03:38:48 INFO spark.SecurityManager: Changing modify acls groups to:
19/01/21 03:38:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxxxxxx); groups with view permissions: Set(); users with modify permissions: Set(xxxxxxx); groups with modify permissions: Set()
19/01/21 03:38:48 INFO yarn.Client: Submitting application application_1547596846411_1167 to ResourceManager
19/01/21 03:38:48 INFO impl.YarnClientImpl: Submitted application application_1547596846411_1167
19/01/21 03:38:48 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1547596846411_1167 and attemptId None
19/01/21 03:38:49 INFO yarn.Client: Application report for application_1547596846411_1167 (state: ACCEPTED)
19/01/21 03:38:49 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: long_running
start time: 1548063528733
final status: UNDEFINED
tracking URL: http://name-dataproc-.:0000/proxy/application_1547596846411_1167/
user: xxxxxxx
19/01/21 03:38:50 INFO yarn.Client: Application report for application_1547596846411_1167 (state: ACCEPTED)
19/01/21 03:38:51 INFO yarn.Client: Application report for application_1547596846411_1167 (state: ACCEPTED)
19/01/21 03:38:52 INFO yarn.Client: Application report for application_1547596846411_1167 (state: ACCEPTED)
19/01/21 03:38:52 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
19/01/21 03:38:52 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,
19/01/21 03:38:53 INFO cluster.YarnClientSchedulerBackend: Application application_1547596846411_1167 has started running.
19/01/21 03:38:53 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34040.
19/01/21 03:38:53 INFO netty.NettyBlockTransferService: Server created on 00.000.00.00:23930
19/01/21 03:38:53 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/01/21 03:38:53 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, ip-address, port, None)
19/01/21 03:38:53 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-address:port with 4.3 GB RAM, BlockManagerId(driver, 10.206.52.22, 46766, None)
19/01/21 03:38:53 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ip-address, port, None)
19/01/21 03:38:53 INFO storage.BlockManager: external shuffle service port = 0000
19/01/21 03:38:53 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, ip-address, port, None)
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#dfsdfsdfgs{/metrics/json,null,AVAILABLE,#Spark}
19/01/21 03:38:54 INFO scheduler.EventLoggingListener: Logging events to hdfs://name-dataproc/user/spark/eventlog/application_1547596846411_1167
19/01/21 03:38:54 INFO util.Utils: Using initial executors = 8, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
19/01/21 03:38:54 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
19/01/21 03:38:54 INFO internal.SharedState: loading hive config file: file:/opt/hadoop/conf/hive-site.xml
19/01/21 03:38:54 INFO internal.SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('gs://place/place/path').
19/01/21 03:38:54 INFO internal.SharedState: Warehouse path is 'gs://place/place/path'.
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#sdfsdgs{/SQL,null,AVAILABLE,#Spark}
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#sdfsdfs{/SQL/json,null,AVAILABLE,#Spark}
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#sdfsdf{/SQL/execution,null,AVAILABLE,#Spark}
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#sdfsdf{/SQL/execution/json,null,AVAILABLE,#Spark}
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#dsfgsdgd{/static/sql,null,AVAILABLE,#Spark}
19/01/21 03:38:55 INFO gcs.GoogleHadoopFileSystemBase: GCS Metadata Cache is enabled: this isn't necessary and in fact is probably detrimental to your job!
19/01/21 03:38:55 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
19/01/21 03:38:55 INFO execution.SparkSqlParser: Parsing command: select * from mytable limit 100
Traceback (most recent call last):
File "/home/xxxxxx/spark_job_example.py", line 8, in <module>
df= sqlContext.sql('select * from mytable limit 100')
File "/opt/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 384, in sql
File "/opt/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 603, in sql
File "/opt/hadoop/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u"Table or view not found: `mytable`.`myttable`; line 1 pos 14;\n'GlobalLimit 100\n+- 'LocalLimit 100\n +- 'Project [*]\n +- 'UnresolvedRelation `mytable`.`table`\n"
19/01/21 03:38:56 INFO spark.SparkContext: Invoking stop() from shutdown hook
19/01/21 03:38:56 INFO server.AbstractConnector: Stopped Spark#fec850a{HTTP/1.1,[http/1.1]}{0.0.0.0:4049}
19/01/21 03:38:56 INFO ui.SparkUI: Stopped Spark web UI at http://10.206.52.22:4049
19/01/21 03:38:56 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
19/01/21 03:38:56 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
19/01/21 03:38:56 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
19/01/21 03:38:56 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
19/01/21 03:38:56 INFO cluster.YarnClientSchedulerBackend: Stopped
19/01/21 03:38:56 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/01/21 03:38:56 INFO memory.MemoryStore: MemoryStore cleared
19/01/21 03:38:56 INFO storage.BlockManager: BlockManager stopped
19/01/21 03:38:56 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
19/01/21 03:38:56 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/01/21 03:38:56 INFO spark.SparkContext: Successfully stopped SparkContext
19/01/21 03:38:56 INFO util.ShutdownHookManager: Shutdown hook called
19/01/21 03:38:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-1c0d417f-4fd6-411a-9480-0fc147d7c9a8
19/01/21 03:38:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-1c0d417f-4fd6-411a-9480-0fc147d7c9a8/pyspark-82d123ce-18ce-43ce-b631-8638bf5ffbfb
I appreciate any help

Error initializing SparkContext., Containers logs: ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 1 5: SIGTERM End of LogType:stderr

When I start the spark-yarn using this command "spark-shell --master yarn-client" Im getting an error saying:
ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
The full error I got in starting spark shell with yarn is below, the logs about yarn containers is here:
Container: container_1463670715317_0002_01_000001 on masternode_52694
============================================================================
LogType:stderr
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:5748
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hadoopadmin/nm-local-dir/usercache /hadoopadmin/filecache/13/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/S taticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.1/share/hadoop/common/li b/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/19 16:19:44 INFO yarn.ApplicationMaster: Registered signal handlers for [T ERM, HUP, INT]
16/05/19 16:19:45 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_ 1463670715317_0002_000001
16/05/19 16:19:46 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:46 INFO spark.SecurityManager: Changing modify acls to: hadoopadm in
16/05/19 16:19:46 INFO spark.SecurityManager: SecurityManager: authentication di sabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users w ith modify permissions: Set(hadoopadmin)
16/05/19 16:19:46 INFO yarn.ApplicationMaster: Waiting for Spark driver to be re achable.
16/05/19 16:19:46 INFO yarn.ApplicationMaster: Driver now available: 10.17.0.50: 43771
16/05/19 16:19:47 INFO yarn.ApplicationMaster$AMEndpoint: Add WebUI Filter. AddW ebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_ HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/a pplication_1463670715317_0002),/proxy/application_1463670715317_0002)
16/05/19 16:19:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0 :8030
16/05/19 16:19:47 INFO yarn.YarnRMClient: Registering the ApplicationMaster
16/05/19 16:19:47 INFO yarn.YarnAllocator: Will request 2 executor containers, e ach with 1 cores and 1408 MB memory including 384 MB overhead
16/05/19 16:19:47 INFO yarn.YarnAllocator: Container request (host: Any, capabil ity: <memory:1408, vCores:1>)
16/05/19 16:19:47 INFO yarn.YarnAllocator: Container request (host: Any, capabil ity: <memory:1408, vCores:1>)
16/05/19 16:19:47 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
16/05/19 16:19:47 INFO impl.AMRMClientImpl: Received new token for : masternode:52694
16/05/19 16:19:47 INFO yarn.YarnAllocator: Launching container container_1463670 715317_0002_01_000002 for on host masternode
16/05/19 16:19:47 INFO yarn.YarnAllocator: Launching ExecutorRunnable. driverUrl : spark://CoarseGrainedScheduler#10.17.0.50:43771, executorHostname: masternode
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Starting Executor Container
16/05/19 16:19:47 INFO yarn.YarnAllocator: Received 1 containers from YARN, laun ching executors on 1 of them.
16/05/19 16:19:47 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-ca ched-nodemanagers-proxies : 0
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Preparing Local resources
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Prepared Local resources Map(_spa rk_.jar -> resource
{ scheme: "hdfs" host: "localhost" port: 9000 file: "/user/ hadoopadmin/.sparkStaging/application_1463670715317_0002/spark-assembly-1.6.1-ha doop2.6.0.jar" }
size: 187698038 timestamp: 1463671182405 type: FILE visibility: PRIVATE)
16/05/19 16:19:48 INFO yarn.ExecutorRunnable:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> PWD<CPS>PWD/_spark_.jar<CPS>$HADOOP_CONF_DIR<CPS>$HAD OOP_COMMON_HOME/share/hadoop/common/<CPS>$HADOOP_COMMON_HOME/share/hadoop/commo n/lib/<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/<CPS>$HADOOP_HDFS_HOME/share/ha doop/hdfs/lib/<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/<CPS>$HADOOP_YARN_HOME/ share/hadoop/yarn/lib/<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/<CPS>$HA DOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/
SPARK_LOG_URL_STDERR -> http://masternode:8042/node/containerlogs/conta iner_1463670715317_0002_01_000002/hadoopadmin/stderr?start=-4096
SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1463670715317_0002
SPARK_YARN_CACHE_FILES_FILE_SIZES -> 187698038
SPARK_USER -> hadoopadmin
SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE
SPARK_YARN_MODE -> true
SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1463671182405
SPARK_LOG_URL_STDOUT -> http://masternode:8042/node/containerlogs/conta iner_1463670715317_0002_01_000002/hadoopadmin/stdout?start=-4096
SPARK_YARN_CACHE_FILES -> hdfs://localhost:9000/user/hadoopadmin/.sparkStagi ng/application_1463670715317_0002/spark-assembly-1.6.1-hadoop2.6.0.jar#_spark_ .jar
command:
JAVA_HOME/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -X mx1024m -Djava.io.tmpdir=PWD/tmp '-Dspark.driver.port=43771' -Dspark.yarn.ap p.container.log.dir=<LOG_DIR> org.apache.spark.executor.CoarseGrainedExecutorBac kend --driver-url spark://CoarseGrainedScheduler#10.17.0.50:43771 --executor-id 1 --hostname masternode --cores 1 --app-id application_1463670715317_0002 - -user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
===============================================================================
16/05/19 16:19:48 INFO impl.ContainerManagementProtocolProxy: Opening proxy : masternode:52694
16/05/19 16:19:48 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
16/05/19 16:19:48 INFO yarn.ApplicationMaster: Final app status: UNDEFINED, exit Code: 0, (reason: Shutdown hook called before final status was reported.)
16/05/19 16:19:48 INFO util.ShutdownHookManager: Shutdown hook called
End of LogType:stderr
LogType:stdout
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:0
Log Contents:
End of LogType:stdout
Container: container_1463670715317_0002_02_000002 on masternode_52694
============================================================================
LogType:stderr
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:737
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hadoopadmin/nm-local-dir/usercache /hadoopadmin/filecache/13/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/S taticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.1/share/hadoop/common/li b/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/19 16:19:54 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
16/05/19 16:19:54 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 1 5: SIGTERM
End of LogType:stderr
LogType:stdout
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:0
Log Contents:
End of LogType:stdout
hadoopadmin#master:~$
The full error that it shows when I try to start spark with "spark-shell --master yarn-client":
hadoopadmin#master:~$ spark-shell --master yarn-client
16/05/19 16:19:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/19 16:19:33 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:33 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin
16/05/19 16:19:33 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin)
16/05/19 16:19:33 INFO spark.HttpServer: Starting HTTP Server
16/05/19 16:19:33 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/19 16:19:33 INFO server.AbstractConnector: Started SocketConnector#0.0.0.0:37052
16/05/19 16:19:33 INFO util.Utils: Successfully started service 'HTTP class server' on port 37052.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
16/05/19 16:19:37 INFO spark.SparkContext: Running Spark version 1.6.1
16/05/19 16:19:37 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:37 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin
16/05/19 16:19:37 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin)
16/05/19 16:19:38 INFO util.Utils: Successfully started service 'sparkDriver' on port 43771.
16/05/19 16:19:38 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/05/19 16:19:38 INFO Remoting: Starting remoting
16/05/19 16:19:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#10.17.0.50:57722]
16/05/19 16:19:38 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 57722.
16/05/19 16:19:38 INFO spark.SparkEnv: Registering MapOutputTracker
16/05/19 16:19:38 INFO spark.SparkEnv: Registering BlockManagerMaster
16/05/19 16:19:38 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-e8de3854-2526-4725-8c73-edb3fce2df33
16/05/19 16:19:38 INFO storage.MemoryStore: MemoryStore started with capacity 511.1 MB
16/05/19 16:19:38 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/05/19 16:19:39 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/19 16:19:39 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
16/05/19 16:19:39 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/05/19 16:19:39 INFO ui.SparkUI: Started SparkUI at http://10.17.0.50:4040
16/05/19 16:19:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/05/19 16:19:39 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/05/19 16:19:39 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/05/19 16:19:39 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/05/19 16:19:39 INFO yarn.Client: Setting up container launch context for our AM
16/05/19 16:19:39 INFO yarn.Client: Setting up the launch environment for our AM container
16/05/19 16:19:39 INFO yarn.Client: Preparing resources for our AM container
16/05/19 16:19:40 INFO yarn.Client: Uploading resource file:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar -> hdfs://localhost:9000/user/hadoopadmin/.sparkStaging/application_1463670715317_0002/spark-assembly-1.6.1-hadoop2.6.0.jar
16/05/19 16:19:42 INFO yarn.Client: Uploading resource file:/tmp/spark-942afe6a-95ca-4b8b-b06f-e9e3ac6aa751/__spark_conf__5009784131719458516.zip -> hdfs://localhost:9000/user/hadoopadmin/.sparkStaging/application_1463670715317_0002/__spark_conf__5009784131719458516.zip
16/05/19 16:19:42 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:42 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin
16/05/19 16:19:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin)
16/05/19 16:19:42 INFO yarn.Client: Submitting application 2 to ResourceManager
16/05/19 16:19:42 INFO impl.YarnClientImpl: Submitted application application_1463670715317_0002
16/05/19 16:19:43 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:43 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1463671182634
final status: UNDEFINED
tracking URL: http://masternode:8088/proxy/application_1463670715317_0002/
user: hadoopadmin
16/05/19 16:19:44 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:45 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:46 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:47 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/05/19 16:19:47 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/application_1463670715317_0002), /proxy/application_1463670715317_0002
16/05/19 16:19:47 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/05/19 16:19:47 INFO yarn.Client: Application report for application_1463670715317_0002 (state: RUNNING)
16/05/19 16:19:47 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.17.0.50
ApplicationMaster RPC port: 0
queue: default
start time: 1463671182634
final status: UNDEFINED
tracking URL: http://masternode:8088/proxy/application_1463670715317_0002/
user: hadoopadmin
16/05/19 16:19:47 INFO cluster.YarnClientSchedulerBackend: Application application_1463670715317_0002 has started running.
16/05/19 16:19:47 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49183.
16/05/19 16:19:47 INFO netty.NettyBlockTransferService: Server created on 49183
16/05/19 16:19:47 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/05/19 16:19:47 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.17.0.50:49183 with 511.1 MB RAM, BlockManagerId(driver, 10.17.0.50, 49183)
16/05/19 16:19:47 INFO storage.BlockManagerMaster: Registered BlockManager
16/05/19 16:19:51 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/05/19 16:19:51 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/application_1463670715317_0002), /proxy/application_1463670715317_0002
16/05/19 16:19:51 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/05/19 16:19:54 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
16/05/19 16:19:54 INFO ui.SparkUI: Stopped Spark web UI at http://10.17.0.50:4040
16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Stopped
16/05/19 16:19:54 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/05/19 16:19:54 INFO storage.MemoryStore: MemoryStore cleared
16/05/19 16:19:54 INFO storage.BlockManager: BlockManager stopped
16/05/19 16:19:54 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
16/05/19 16:19:54 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/05/19 16:19:54 INFO spark.SparkContext: Successfully stopped SparkContext
16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/05/19 16:20:09 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
16/05/19 16:20:09 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:584)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
at $line3.$read$$iwC$$iwC.<init>(<console>:15)
at $line3.$read$$iwC.<init>(<console>:24)
at $line3.$read.<init>(<console>:26)
at $line3.$read$.<init>(<console>:30)
at $line3.$read$.<clinit>(<console>)
at $line3.$eval$.<init>(<console>:7)
at $line3.$eval$.<clinit>(<console>)
at $line3.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/05/19 16:20:09 INFO spark.SparkContext: SparkContext already stopped.
java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:584)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
at $iwC$$iwC.<init>(<console>:15)
at $iwC.<init>(<console>:24)
at <init>(<console>:26)
at .<init>(<console>:30)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at ... org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
<console>:16: error: not found: value sqlContext
import sqlContext.implicits._
^
<console>:16: error: not found: value sqlContext
import sqlContext.sql
^
Something exceeded it's memory budget. No helpful errors but that's what it was for me. Try upping various parameters like MaxPermSize and memoryOverhead.
https://mail-archives.apache.org/mod_mbox/spark-user/201507.mbox/%3c55A372C5.9050801#googlemail.com%3e

Kafka message consumption with spark

I am using HDP-2.3 sandbox for Consuming kafka messages by running SPARK submit job.
i am putting some messages in kafka as below:
kafka-console-producer.sh --broker-list sandbox.hortonworks.com:6667 --topic webevent
OR
kafka-console-producer.sh --broker-list sandbox.hortonworks.com:6667 --topic test --new-producer < myfile.txt
Now i need to consume above messages from spark job as shown below:
./bin/spark-submit --master spark://192.168.255.150:7077 --executor-memory 512m --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar 192.168.255.150:2181 webevent 10
Where 2181 is a zookeeper port
I am getting Error as shown(Guide me how to consume that message from Kafka):
16/05/02 15:21:30 INFO SparkContext: Running Spark version 1.3.1
16/05/02 15:21:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/02 15:21:31 INFO SecurityManager: Changing view acls to: root
16/05/02 15:21:31 INFO SecurityManager: Changing modify acls to: root
16/05/02 15:21:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/05/02 15:21:31 INFO Slf4jLogger: Slf4jLogger started
16/05/02 15:21:31 INFO Remoting: Starting remoting
16/05/02 15:21:32 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#sandbox.hortonworks.com:53950]
16/05/02 15:21:32 INFO Utils: Successfully started service 'sparkDriver' on port 53950.
16/05/02 15:21:32 INFO SparkEnv: Registering MapOutputTracker
16/05/02 15:21:32 INFO SparkEnv: Registering BlockManagerMaster
16/05/02 15:21:32 INFO DiskBlockManager: Created local directory at /tmp/spark-c70b08b9-41a3-42c8-9d83-bc4258e299c6/blockmgr-c2d86de6-34a7-497c-8018-d3437a100e87
16/05/02 15:21:32 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
16/05/02 15:21:32 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a8f7ade9-292c-42c4-9e54-43b3b3495b0c/httpd-65d36d04-1e2a-4e69-8d20-295465100070
16/05/02 15:21:32 INFO HttpServer: Starting HTTP Server
16/05/02 15:21:32 INFO Server: jetty-8.y.z-SNAPSHOT
16/05/02 15:21:32 INFO AbstractConnector: Started SocketConnector#0.0.0.0:37014
16/05/02 15:21:32 INFO Utils: Successfully started service 'HTTP file server' on port 37014.
16/05/02 15:21:32 INFO SparkEnv: Registering OutputCommitCoordinator
16/05/02 15:21:32 INFO Server: jetty-8.y.z-SNAPSHOT
16/05/02 15:21:32 INFO AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
16/05/02 15:21:32 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/05/02 15:21:32 INFO SparkUI: Started SparkUI at http://sandbox.hortonworks.com:4040
16/05/02 15:21:33 INFO SparkContext: Added JAR file:/usr/hdp/2.3.0.0-2130/spark/lib/spark-examples-1.4.1-hadoop2.4.0.jar at http://192.168.255.150:37014/jars/spark-examples-1.4.1-hadoop2.4.0.jar with timestamp 1462202493866
16/05/02 15:21:34 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#192.168.255.150:7077/user/Master...
16/05/02 15:21:34 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160502152134-0000
16/05/02 15:21:34 INFO AppClient$ClientActor: Executor added: app-20160502152134-0000/0 on worker-20160502150437-sandbox.hortonworks.com-36920 (sandbox.hortonworks.com:36920) with 1 cores
16/05/02 15:21:34 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160502152134-0000/0 on hostPort sandbox.hortonworks.com:36920 with 1 cores, 512.0 MB RAM
16/05/02 15:21:34 INFO AppClient$ClientActor: Executor updated: app-20160502152134-0000/0 is now RUNNING
16/05/02 15:21:34 INFO AppClient$ClientActor: Executor updated: app-20160502152134-0000/0 is now LOADING
16/05/02 15:21:34 INFO NettyBlockTransferService: Server created on 43440
16/05/02 15:21:34 INFO BlockManagerMaster: Trying to register BlockManager
16/05/02 15:21:34 INFO BlockManagerMasterActor: Registering block manager sandbox.hortonworks.com:43440 with 265.4 MB RAM, BlockManagerId(<driver>, sandbox.hortonworks.com, 43440)
16/05/02 15:21:34 INFO BlockManagerMaster: Registered BlockManager
16/05/02 15:21:35 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/05/02 15:21:35 INFO VerifiableProperties: Verifying properties
16/05/02 15:21:35 INFO VerifiableProperties: Property group.id is overridden to
16/05/02 15:21:35 INFO VerifiableProperties: Property zookeeper.connect is overridden to
16/05/02 15:21:35 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
Error: application failed with exception
org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
at scala.util.Either.fold(Either.scala:97)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:415)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:532)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at org.apache.spark.examples.streaming.JavaDirectKafkaWordCount.main(JavaDirectKafkaWordCount.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
OR
wen i use this:
./bin/spark-submit --master spark://192.168.255.150:7077 --executor-memory 512m --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar 192.168.255.150:6667 webevent 10
where 6667 is a Kafka’s message producing port, i am getting this error:
16/05/02 15:27:26 INFO SimpleConsumer: Reconnect due to socket error: java.nio.channels.ClosedChannelException
Error: application failed with exception
org.apache.spark.SparkException: java.nio.channels.ClosedChannelException
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
i dont know if this can help:
./bin/spark-submit --class consumer.kafka.client.Consumer --master spark://192.168.255.150:7077 --executor-memory 1G lib/kafka-spark-consumer-1.0.6.jar 10

Scala Spark App submitted to yarn-cluster and unregistered with SUCCEEDED without doing anything

Goal
Run our scala spark app jar on yarn-cluster mode. It works with standalone cluster mode and with yarn-client, but for some reason it does not run to completion for yarn-cluster mode.
Details
The last portion of the code it seems to execute is on assigning the initial value to the Dataframe when reading the input file. It looks like it does not do anything after that. None of the logs look abnormal and there are no Warns or errors either. It suddenly gets unregistered with status succeeded and everything gets killed. On any other deployment mode (eg. yarn-client, standalone cluster mode) everything runs smoothly to completion.
15/07/22 15:57:00 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
I have also ran this job on spark 1.3.x and 1.4.x on a vanilla spark/YARN cluster and a cdh 5.4.3 cluster as well. All with the same results. What could possibly be the issue?
Job was run with the command below and the input file is accessible through hdfs.
bin/spark-submit --master yarn-cluster --class AssocApp ../associationRulesScala/target/scala-2.10/AssociationRule_2.10.4-1.0.0.SNAPSHOT.jar hdfs://sparkMaster-hk:9000/user/root/BreastCancer.csv
Code snippets
this is the code in the area were the dataframe is loaded. It spits out the log message "Uploading Dataframe..." but there is nothing else after that. Refer to the driver's logs below
//...
logger.info("Uploading Dataframe from %s".format(filename))
sparkParams.sqlContext.csvFile(filename)
MDC.put("jobID",jobID.takeRight(3))
logger.info("Extracting Unique Vals from each of %d columns...".format(frame.columns.length))
private val uniqueVals = frame.columns.zipWithIndex.map(colname => (colname._2, colname._1, frame.select(colname._1).distinct.cache)).
//...
Driver logs
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-root/nm-local-dir/usercache/root/filecache/60/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/07/22 15:56:52 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
15/07/22 15:56:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1434116948302_0097_000001
15/07/22 15:56:55 INFO spark.SecurityManager: Changing view acls to: root
15/07/22 15:56:55 INFO spark.SecurityManager: Changing modify acls to: root
15/07/22 15:56:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/07/22 15:56:55 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
15/07/22 15:56:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization
15/07/22 15:56:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
15/07/22 15:56:56 INFO AssocApp$: Starting new Association Rules calculation. From File: hdfs://sparkMaster-hk:9000/user/root/BreastCancer.csv
15/07/22 15:56:56 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
15/07/22 15:56:57 INFO associationRules.primaryPackageSpark: Uploading Dataframe from hdfs://sparkMaster-hk:9000/user/root/BreastCancer.csv
15/07/22 15:56:57 INFO spark.SparkContext: Running Spark version 1.4.0
15/07/22 15:56:57 INFO spark.SecurityManager: Changing view acls to: root
15/07/22 15:56:57 INFO spark.SecurityManager: Changing modify acls to: root
15/07/22 15:56:57 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/07/22 15:56:57 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/07/22 15:56:57 INFO Remoting: Starting remoting
15/07/22 15:56:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#119.81.232.13:41459]
15/07/22 15:56:57 INFO util.Utils: Successfully started service 'sparkDriver' on port 41459.
15/07/22 15:56:57 INFO spark.SparkEnv: Registering MapOutputTracker
15/07/22 15:56:57 INFO spark.SparkEnv: Registering BlockManagerMaster
15/07/22 15:56:57 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/blockmgr-f0e66040-1fdb-4a05-87e1-160194829f84
15/07/22 15:56:57 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
15/07/22 15:56:58 INFO spark.HttpFileServer: HTTP File server directory is /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/httpd-79b304a1-3cf4-4951-9e22-bbdfac435824
15/07/22 15:56:58 INFO spark.HttpServer: Starting HTTP Server
15/07/22 15:56:58 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/22 15:56:58 INFO server.AbstractConnector: Started SocketConnector#0.0.0.0:36021
15/07/22 15:56:58 INFO util.Utils: Successfully started service 'HTTP file server' on port 36021.
15/07/22 15:56:58 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/07/22 15:56:58 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/07/22 15:56:58 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/22 15:56:58 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:53274
15/07/22 15:56:58 INFO util.Utils: Successfully started service 'SparkUI' on port 53274.
15/07/22 15:56:58 INFO ui.SparkUI: Started SparkUI at http://119.XX.XXX.XX:53274
15/07/22 15:56:58 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
15/07/22 15:56:59 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34498.
15/07/22 15:56:59 INFO netty.NettyBlockTransferService: Server created on 34498
15/07/22 15:56:59 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/07/22 15:56:59 INFO storage.BlockManagerMasterEndpoint: Registering block manager 119.81.232.13:34498 with 267.3 MB RAM, BlockManagerId(driver, 119.81.232.13, 34498)
15/07/22 15:56:59 INFO storage.BlockManagerMaster: Registered BlockManager
15/07/22 15:56:59 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#-819146876])
15/07/22 15:56:59 INFO client.RMProxy: Connecting to ResourceManager at sparkMaster-hk/119.81.232.24:8030
15/07/22 15:56:59 INFO yarn.YarnRMClient: Registering the ApplicationMaster
15/07/22 15:57:00 INFO yarn.YarnAllocator: Will request 2 executor containers, each with 1 cores and 1408 MB memory including 384 MB overhead
15/07/22 15:57:00 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>)
15/07/22 15:57:00 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>)
15/07/22 15:57:00 INFO yarn.ApplicationMaster: Started progress reporter thread - sleep time : 5000
15/07/22 15:57:00 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
15/07/22 15:57:00 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
15/07/22 15:57:00 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1434116948302_0097
15/07/22 15:57:00 INFO storage.DiskBlockManager: Shutdown hook called
15/07/22 15:57:00 INFO util.Utils: Shutdown hook called
15/07/22 15:57:00 INFO util.Utils: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/httpd-79b304a1-3cf4-4951-9e22-bbdfac435824
15/07/22 15:57:00 INFO util.Utils: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1434116948302_0097/userFiles-e01b4dd2-681c-4108-aec6-879774652c7a

spark-submit yarn-client run failed

Using the yarn-client to run spark program.
I've build the spark on yarn environment.
the scripts is
./bin/spark-submit --class WordCountTest \
--master yarn-client \
--num-executors 1 \
--executor-cores 1 \
--queue root.hadoop \
/root/Desktop/test2.jar \
10
when running I get the following exception.
15/05/12 17:42:01 INFO spark.SparkContext: Running Spark version 1.3.1
15/05/12 17:42:01 WARN spark.SparkConf:
SPARK_CLASSPATH was detected (set to ':/usr/local/hadoop/hadoop-2.5.2/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar').
This is deprecated in Spark 1.0+.
Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath
15/05/12 17:42:01 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to ':/usr/local/hadoop/hadoop-2.5.2/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/05/12 17:42:01 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to ':/usr/local/hadoop/hadoop-2.5.2/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar' as a work-around.
15/05/12 17:42:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/05/12 17:42:02 INFO spark.SecurityManager: Changing view acls to: root
15/05/12 17:42:02 INFO spark.SecurityManager: Changing modify acls to: root
15/05/12 17:42:02 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/05/12 17:42:02 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/05/12 17:42:02 INFO Remoting: Starting remoting
15/05/12 17:42:03 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#master:49338]
15/05/12 17:42:03 INFO util.Utils: Successfully started service 'sparkDriver' on port 49338.
15/05/12 17:42:03 INFO spark.SparkEnv: Registering MapOutputTracker
15/05/12 17:42:03 INFO spark.SparkEnv: Registering BlockManagerMaster
15/05/12 17:42:03 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-57f5fb29-784d-4730-92b8-c2e8be97c038/blockmgr-752988bc-b2d0-42f7-891d-5d3edbb4526d
15/05/12 17:42:03 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
15/05/12 17:42:04 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-2f2a46eb-9259-4c6e-b9af-7159efb0b3e9/httpd-3c50fe1e-430e-4077-9cd0-58246e182d98
15/05/12 17:42:04 INFO spark.HttpServer: Starting HTTP Server
15/05/12 17:42:04 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/05/12 17:42:04 INFO server.AbstractConnector: Started SocketConnector#0.0.0.0:41749
15/05/12 17:42:04 INFO util.Utils: Successfully started service 'HTTP file server' on port 41749.
15/05/12 17:42:04 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/05/12 17:42:05 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/05/12 17:42:05 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
15/05/12 17:42:05 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/05/12 17:42:05 INFO ui.SparkUI: Started SparkUI at http://master:4040
15/05/12 17:42:05 INFO spark.SparkContext: Added JAR file:/root/Desktop/test2.jar at http://192.168.147.201:41749/jars/test2.jar with timestamp 1431423725289
15/05/12 17:42:05 WARN cluster.YarnClientSchedulerBackend: NOTE: SPARK_WORKER_MEMORY is deprecated. Use SPARK_EXECUTOR_MEMORY or --executor-memory through spark-submit instead.
15/05/12 17:42:06 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.147.201:8032
15/05/12 17:42:06 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
15/05/12 17:42:06 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/05/12 17:42:06 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/05/12 17:42:06 INFO yarn.Client: Setting up container launch context for our AM
15/05/12 17:42:06 INFO yarn.Client: Preparing resources for our AM container
15/05/12 17:42:07 WARN yarn.Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/05/12 17:42:07 INFO yarn.Client: Uploading resource file:/usr/local/spark/spark-1.3.1-bin-hadoop2.5.0-cdh5.3.2/lib/spark-assembly-1.3.1-hadoop2.5.0-cdh5.3.2.jar -> hdfs://master:9000/user/root/.sparkStaging/application_1431423592173_0003/spark-assembly-1.3.1-hadoop2.5.0-cdh5.3.2.jar
15/05/12 17:42:11 INFO yarn.Client: Setting up the launch environment for our AM container
15/05/12 17:42:11 WARN yarn.Client: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/05/12 17:42:11 INFO spark.SecurityManager: Changing view acls to: root
15/05/12 17:42:11 INFO spark.SecurityManager: Changing modify acls to: root
15/05/12 17:42:11 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/05/12 17:42:11 INFO yarn.Client: Submitting application 3 to ResourceManager
15/05/12 17:42:11 INFO impl.YarnClientImpl: Submitted application application_1431423592173_0003
15/05/12 17:42:12 INFO yarn.Client: Application report for application_1431423592173_0003 (state: FAILED)
15/05/12 17:42:12 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1431423592173_0003 submitted by user root to unknown queue: root.hadoop
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hadoop
start time: 1431423731271
final status: FAILED
tracking URL: N/A
user: root
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:113)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:381)
at WordCountTest$.main(WordCountTest.scala:14)
at WordCountTest.main(WordCountTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
My code very simple, just as following:
object WordCountTest {
def main (args: Array[String]): Unit = {
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
val sparkConf = new SparkConf().setAppName("WordCountTest Prog")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val file = sc.textFile("/data/test/pom.xml")
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
println(counts)
counts.saveAsTextFile("/data/test/pom_count.txt")
}
}
I've debug this problem for 2 days. Help!Help! Thx.
Try changing queue name to hadoop
in my case,
change “--queue thequeue” to “--queue default”
it work
运行:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 --queue thequeue lib/spark-examples*.jar 10
时报如下错误,只需要将“--queue thequeue”改成“--queue default”即可。

Resources