Integrating Kafka version 2.11-0.10.0.1 with spark streaming ver 2.1.1 - apache-spark

I'm trying to run KafkaWordCount example in spark streaming using Spark version 2.1.1 in standalone cluster mode. As the kafka version on the server that I'm trying to integrate with is 2.11-0.10.0.1 . According to https://spark.apache.org/docs/latest/streaming-kafka-integration.html there are two separate packages one for 0.8.2.1 or higher and another for 0.10.0 or higher.
I've added following jars to the jars folder within spark home :
kafka_2.11-0.10.0.1.jar
spark-streaming-kafka-0-10-assembly_2.11-2.1.1.jar
spark-streaming-kafka-0-10_2.11-2.1.1.jar
Running this command :
/usr/local/spark/bin/spark-submit --num-executors 1 --executor-memory 20G --total-executor-cores 4 --class org.apache.spark.examples.streaming.KafkaWordCount /usr/local/spark/examples/jars/spark-examples_2.11-2.1.1.jar 10.0.16.96:2181 group_test topic 6
shows Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$
Is there any other jar that I missed ?
logs :
/usr/local/spark/bin/spark-submit --num-executors 1 --executor-memory 20G --total-executor-cores 4 --class org.apache.spark.examples.streaming.KafkaWordCount /usr/local/spark/examples/jars/spark-examples_2.11-2.1.1.jar 10.0.16.96:2181 group_test streams 6
Warning: Ignoring non-spark config property: fs.s3.awsAccessKeyId=AKIAIETFDAABYC23XVSQ
Warning: Ignoring non-spark config property: fs.s3.awsSecretAccessKey=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
17/07/11 08:04:31 INFO spark.SparkContext: Running Spark version 2.1.1
17/07/11 08:04:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/07/11 08:04:31 INFO spark.SecurityManager: Changing view acls to: mahendra
17/07/11 08:04:31 INFO spark.SecurityManager: Changing modify acls to: mahendra
17/07/11 08:04:31 INFO spark.SecurityManager: Changing view acls groups to:
17/07/11 08:04:31 INFO spark.SecurityManager: Changing modify acls groups to:
17/07/11 08:04:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mahendra); groups with view permissions: Set(); users with modify permissions: Set(mahendra); groups with modify permissions: Set()
17/07/11 08:04:32 INFO util.Utils: Successfully started service 'sparkDriver' on port 38173.
17/07/11 08:04:32 INFO spark.SparkEnv: Registering MapOutputTracker
17/07/11 08:04:32 INFO spark.SparkEnv: Registering BlockManagerMaster
17/07/11 08:04:32 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/07/11 08:04:32 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/07/11 08:04:32 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-241eda29-1cb3-4364-859c-79ba86689fbf
17/07/11 08:04:32 INFO memory.MemoryStore: MemoryStore started with capacity 5.2 GB
17/07/11 08:04:32 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/07/11 08:04:32 INFO util.log: Logging initialized #1581ms
17/07/11 08:04:32 INFO server.Server: jetty-9.2.z-SNAPSHOT
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#a7e2d9d{/jobs,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#754777cd{/jobs/json,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#2b52c0d6{/jobs/job,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#372ea2bc{/jobs/job/json,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4cc76301{/stages,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#2f08c4b{/stages/json,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#3f19b8b3{/stages/stage,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#7de0c6ae{/stages/stage/json,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#a486d78{/stages/pool,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#cdc3aae{/stages/pool/json,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#7ef2d7a6{/storage,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#5dcbb60{/storage/json,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4c36250e{/storage/rdd,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#21526f6c{/storage/rdd/json,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#49f5c307{/environment,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#299266e2{/environment/json,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#5471388b{/executors,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#66ea1466{/executors/json,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#1601e47{/executors/threadDump,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#3bffddff{/executors/threadDump/json,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#66971f6b{/static,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#50687efb{/,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#517bd097{/api,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#142eef62{/jobs/job/kill,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4a9cc6cb{/stages/stage/kill,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO server.ServerConnector: Started Spark#6de54b40{HTTP/1.1}{0.0.0.0:4040}
17/07/11 08:04:32 INFO server.Server: Started #1696ms
17/07/11 08:04:32 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/07/11 08:04:32 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.16.15:4040
17/07/11 08:04:32 INFO spark.SparkContext: Added JAR file:/usr/local/spark/examples/jars/spark-examples_2.11-2.1.1.jar at spark://10.0.16.15:38173/jars/spark-examples_2.11-2.1.1.jar with timestamp 1499760272476
17/07/11 08:04:32 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://ip-10-0-16-15.ap-southeast-1.compute.internal:7077...
17/07/11 08:04:32 INFO client.TransportClientFactory: Successfully created connection to ip-10-0-16-15.ap-southeast-1.compute.internal/10.0.16.15:7077 after 27 ms (0 ms spent in bootstraps)
17/07/11 08:04:32 INFO cluster.StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20170711080432-0038
17/07/11 08:04:32 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20170711080432-0038/0 on worker-20170707101056-10.0.16.51-40051 (10.0.16.51:40051) with 4 cores
17/07/11 08:04:32 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20170711080432-0038/0 on hostPort 10.0.16.51:40051 with 4 cores, 20.0 GB RAM
17/07/11 08:04:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35723.
17/07/11 08:04:32 INFO netty.NettyBlockTransferService: Server created on 10.0.16.15:35723
17/07/11 08:04:32 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/07/11 08:04:32 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20170711080432-0038/0 is now RUNNING
17/07/11 08:04:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.16.15:35723 with 5.2 GB RAM, BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#34448e6c{/metrics/json,null,AVAILABLE,#Spark}
17/07/11 08:04:32 INFO cluster.StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/07/11 08:04:33 WARN fs.FileSystem: Cannot load filesystem
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2631)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2650)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:357)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.streaming.StreamingContext.checkpoint(StreamingContext.scala:238)
at org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:54)
at org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.newInstance(Class.java:412)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 24 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.StorageStatistics
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 29 more
17/07/11 08:04:33 WARN spark.SparkContext: Spark is not running in local mode, therefore the checkpoint directory must not be on the local filesystem. Directory 'file:/home/mahendra/checkpoint' appears to be on the local filesystem.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$
at org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:57)
at org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtils$
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 11 more
17/07/11 08:04:33 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/07/11 08:04:33 INFO server.ServerConnector: Stopped Spark#6de54b40{HTTP/1.1}{0.0.0.0:4040}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#4a9cc6cb{/stages/stage/kill,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#142eef62{/jobs/job/kill,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#517bd097{/api,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#50687efb{/,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#66971f6b{/static,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#3bffddff{/executors/threadDump/json,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#1601e47{/executors/threadDump,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#66ea1466{/executors/json,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#5471388b{/executors,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#299266e2{/environment/json,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#49f5c307{/environment,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#21526f6c{/storage/rdd/json,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#4c36250e{/storage/rdd,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#5dcbb60{/storage/json,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#7ef2d7a6{/storage,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#cdc3aae{/stages/pool/json,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#a486d78{/stages/pool,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#7de0c6ae{/stages/stage/json,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#3f19b8b3{/stages/stage,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#2f08c4b{/stages/json,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#4cc76301{/stages,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#372ea2bc{/jobs/job/json,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#2b52c0d6{/jobs/job,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#754777cd{/jobs/json,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler#a7e2d9d{/jobs,null,UNAVAILABLE,#Spark}
17/07/11 08:04:33 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.16.15:4040
17/07/11 08:04:33 INFO cluster.StandaloneSchedulerBackend: Shutting down all executors
17/07/11 08:04:33 INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
17/07/11 08:04:33 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/07/11 08:04:33 INFO memory.MemoryStore: MemoryStore cleared
17/07/11 08:04:33 INFO storage.BlockManager: BlockManager stopped
17/07/11 08:04:33 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/07/11 08:04:33 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/07/11 08:04:33 INFO spark.SparkContext: Successfully stopped SparkContext
17/07/11 08:04:33 INFO util.ShutdownHookManager: Shutdown hook called
17/07/11 08:04:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a7875c5c-cdfc-486e-bf7d-7fe0a7cff228
Thanks !

Related

Getting NoClassDefFoundError using Spark with spark-cassandra-connector 3.1.0

I've been trying to submit a spark application but get the following exception:
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/jars/spark-unsafe_2.12-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/11/13 13:17:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-11-13T13:17:46+0330 - INFO - Great Expectations logging enabled at 20 level by JupyterUX module.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/11/13 13:17:47 INFO SparkContext: Running Spark version 3.2.0
21/11/13 13:17:47 INFO ResourceUtils: ==============================================================
21/11/13 13:17:47 INFO ResourceUtils: No custom resources configured for spark.driver.
21/11/13 13:17:47 INFO ResourceUtils: ==============================================================
21/11/13 13:17:47 INFO SparkContext: Submitted application: examstat
21/11/13 13:17:47 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
21/11/13 13:17:47 INFO ResourceProfile: Limiting resource is cpu
21/11/13 13:17:47 INFO ResourceProfileManager: Added ResourceProfile id: 0
21/11/13 13:17:47 INFO SecurityManager: Changing view acls to: alisaberi
21/11/13 13:17:47 INFO SecurityManager: Changing modify acls to: alisaberi
21/11/13 13:17:47 INFO SecurityManager: Changing view acls groups to:
21/11/13 13:17:47 INFO SecurityManager: Changing modify acls groups to:
21/11/13 13:17:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(alisaberi); groups with view permissions: Set(); users with modify permissions: Set(alisaberi); groups with modify permissions: Set()
21/11/13 13:17:47 INFO Utils: Successfully started service 'sparkDriver' on port 62135.
21/11/13 13:17:47 INFO SparkEnv: Registering MapOutputTracker
21/11/13 13:17:47 INFO SparkEnv: Registering BlockManagerMaster
21/11/13 13:17:47 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/11/13 13:17:47 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/11/13 13:17:47 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
21/11/13 13:17:47 INFO DiskBlockManager: Created local directory at /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/blockmgr-e6d2444c-2aa6-4690-ac82-7a4ab1d86b6b
21/11/13 13:17:47 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
21/11/13 13:17:47 INFO SparkEnv: Registering OutputCommitCoordinator
21/11/13 13:17:47 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/11/13 13:17:47 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.3:4040
21/11/13 13:17:47 INFO SparkContext: Added JAR file:///Users/alisaberi/Desktop/test-great-expectations/spark-cassandra-connector-assembly_2.12-3.1.0.jar at spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar with timestamp 1636796867038
21/11/13 13:17:47 INFO Executor: Starting executor ID driver on host 192.168.1.3
21/11/13 13:17:47 INFO Executor: Fetching spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar with timestamp 1636796867038
21/11/13 13:17:47 INFO TransportClientFactory: Successfully created connection to /192.168.1.3:62135 after 42 ms (0 ms spent in bootstraps)
21/11/13 13:17:47 INFO Utils: Fetching spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar to /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/userFiles-89f4f184-ba26-4a28-b83f-52cec85d7563/fetchFileTemp11862606911562884947.tmp
21/11/13 13:17:48 INFO Executor: Adding file:/private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/userFiles-89f4f184-ba26-4a28-b83f-52cec85d7563/spark-cassandra-connector-assembly_2.12-3.1.0.jar to class loader
21/11/13 13:17:48 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 62138.
21/11/13 13:17:48 INFO NettyBlockTransferService: Server created on 192.168.1.3:62138
21/11/13 13:17:48 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/11/13 13:17:48 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.3, 62138, None)
21/11/13 13:17:48 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.3:62138 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.1.3, 62138, None)
21/11/13 13:17:48 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.3, 62138, None)
21/11/13 13:17:48 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.3, 62138, None)
21/11/13 13:17:48 WARN SparkSession: Cannot use com.datastax.spark.connector.CassandraSparkExtensions to configure session extensions.
java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:151)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:825)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:723)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:646)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:604)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:576)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:468)
at org.apache.spark.util.Utils$.classForName(Utils.scala:216)
at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1(SparkSession.scala:1194)
at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1$adapted(SparkSession.scala:1192)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$applyExtensions(SparkSession.scala:1192)
at org.apache.spark.sql.SparkSession.<init>(SparkSession.scala:104)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:64)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.util.Logging
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 33 more
21/11/13 13:17:48 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
21/11/13 13:17:48 INFO SharedState: Warehouse path is 'file:/Users/alisaberi/Desktop/test-great-expectations/spark-warehouse'.
/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/context.py:77: FutureWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
Traceback (most recent call last):
File "/Users/alisaberi/Desktop/test-great-expectations/test.py", line 33, in <module>
sqlContext.read\
File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 164, in load
File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1309, in __call__
File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o56.load.
: java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:151)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:825)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:723)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:646)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:604)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.spark.sql.cassandra.DefaultSource.getTable(DefaultSource.scala:55)
at org.apache.spark.sql.cassandra.DefaultSource.inferSchema(DefaultSource.scala:72)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:233)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.util.Logging
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 28 more
21/11/13 13:17:49 INFO SparkContext: Invoking stop() from shutdown hook
21/11/13 13:17:49 INFO SparkUI: Stopped Spark web UI at http://192.168.1.3:4040
21/11/13 13:17:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/11/13 13:17:49 INFO MemoryStore: MemoryStore cleared
21/11/13 13:17:49 INFO BlockManager: BlockManager stopped
21/11/13 13:17:49 INFO BlockManagerMaster: BlockManagerMaster stopped
21/11/13 13:17:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/11/13 13:17:49 INFO SparkContext: Successfully stopped SparkContext
21/11/13 13:17:49 INFO ShutdownHookManager: Shutdown hook called
21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-ef03b69b-8170-49e1-a24f-af46ff8ada7d
21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/pyspark-42c7c117-c948-4b16-82a6-39017769cff9
21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb
The application use spark-cassandra-connector to read from cassandra. Here is the code:
from pyspark.sql import SQLContext, SparkSession
from pyspark.context import SparkContext
spark = SparkSession\
.builder\
.appName("Test")\
.master('local[*]') \
.config('spark.cassandra.connection.host', 'localhost') \
.getOrCreate()
spark.read\
.format("org.apache.spark.sql.cassandra")\
.options(table="gps", keyspace="test")\
.load().show()
I've tried two different approaches to submit the application:
$SPARK_HOME/bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0 ./test.py
$SPARK_HOME/bin/spark-submit --jars /Full/Path/to/spark-cassandra-connector-assembly_2.12-3.1.0.jar
Also when I run the same code in pyspark shell, it works fine.
Spark 3.2.0
spark-cassandra-connector 3.1.0
cassandra 4.0.1

How to run spark-submit in virtualenv for pyspark?

Is there a way to run spark-submit (spark v2.3.2 from HDP 3.1.0) while in a virtualenv? Have situation where have python file that uses python3 (and some specific libs) in a virtualenv (to isolate lib versions from rest of system). I would like to run this file with /bin/spark-submit, but attempting to do so I get...
[me#airflowetl tests]$ source ../venv/bin/activate; /bin/spark-submit sparksubmit.test.py
File "/bin/hdp-select", line 255
print "ERROR: Invalid package - " + name
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("ERROR: Invalid package - " + name)?
ls: cannot access /usr/hdp//hadoop/lib: No such file or directory
Exception in thread "main" java.lang.IllegalStateException: hdp.version is not set while running Spark under HDP, please set through HDP_VERSION in spark-env.sh or add a java-opts file in conf with -Dhdp.version=xxx
at org.apache.spark.launcher.Main.main(Main.java:118)
also tried...
(venv) [me#airflowetl tests]$ export HADOOP_CONF_DIR=/etc/hadoop/conf; spark-submit --master yarn --deploy-mode cluster sparksubmit.test.py
19/12/12 13:50:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/12 13:50:20 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
....
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
...or (from here https://www.hackingnote.com/en/spark/trouble-shooting/NoClassDefFoundError-ClientConfig)...
(venv) [airflow#airflowetl tests]$ spark-submit --master yarn --deploy-mode client --conf spark.hadoop.yarn.timeline-service.enabled=false sparksubmit.test.py
19/12/12 15:22:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/12 15:22:49 INFO spark.SparkContext: Running Spark version 2.4.4
19/12/12 15:22:49 INFO spark.SparkContext: Submitted application: hph_etl_TEST
19/12/12 15:22:49 INFO spark.SecurityManager: Changing view acls to: airflow
19/12/12 15:22:49 INFO spark.SecurityManager: Changing modify acls to: airflow
19/12/12 15:22:49 INFO spark.SecurityManager: Changing view acls groups to:
19/12/12 15:22:49 INFO spark.SecurityManager: Changing modify acls groups to:
19/12/12 15:22:49 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(airflow); groups with view permissions: Set(); users with modify permissions: Set(airflow); groups with modify permissions: Set()
19/12/12 15:22:49 INFO util.Utils: Successfully started service 'sparkDriver' on port 45232.
19/12/12 15:22:50 INFO spark.SparkEnv: Registering MapOutputTracker
19/12/12 15:22:50 INFO spark.SparkEnv: Registering BlockManagerMaster
19/12/12 15:22:50 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/12/12 15:22:50 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/12/12 15:22:50 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-320366b6-609a-497b-ac40-119d11682044
19/12/12 15:22:50 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
19/12/12 15:22:50 INFO spark.SparkEnv: Registering OutputCommitCoordinator
19/12/12 15:22:50 INFO util.log: Logging initialized #2663ms
19/12/12 15:22:50 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
19/12/12 15:22:50 INFO server.Server: Started #2763ms
19/12/12 15:22:50 INFO server.AbstractConnector: Started ServerConnector#50a3c656{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
19/12/12 15:22:50 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#306c15f1{/jobs,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#2b566f8d{/jobs/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#1b5ef515{/jobs/job,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#59f7a5e2{/jobs/job/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#41c58356{/stages,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#2d5f2026{/stages/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#324ca89a{/stages/stage,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#6f487c61{/stages/stage/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#3897116a{/stages/pool,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#68ab090f{/stages/pool/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#42ea3278{/storage,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#6eedf530{/storage/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#6e71a5c6{/storage/rdd,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#5e222a76{/storage/rdd/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4dc8aa38{/environment,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4c8d82c4{/environment/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#2fb15106{/executors,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#608faf1c{/executors/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#689e405f{/executors/threadDump,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#48a5742a{/executors/threadDump/json,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#6db93559{/static,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4d7ed508{/,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#5510f12d{/api,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#6d87de7{/jobs/job/kill,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#62595660{/stages/stage/kill,null,AVAILABLE,#Spark}
19/12/12 15:22:50 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://airflowetl.local:4040
19/12/12 15:22:51 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
19/12/12 15:22:51 INFO client.RMProxy: Connecting to ResourceManager at hw001.local/172.18.4.46:8050
19/12/12 15:22:51 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
19/12/12 15:22:51 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (15360 MB per container)
19/12/12 15:22:51 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
19/12/12 15:22:51 INFO yarn.Client: Setting up container launch context for our AM
19/12/12 15:22:51 INFO yarn.Client: Setting up the launch environment for our AM container
19/12/12 15:22:51 INFO yarn.Client: Preparing resources for our AM container
19/12/12 15:22:51 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/12/12 15:22:53 INFO yarn.Client: Uploading resource file:/tmp/spark-4e600acd-2d34-4271-b01c-25f312906f93/__spark_libs__8368679994314392346.zip -> hdfs://hw001.local:8020/user/airflow/.sparkStaging/application_1572898343646_0029/__spark_libs__8368679994314392346.zip
19/12/12 15:22:54 INFO yarn.Client: Uploading resource file:/home/airflow/projects/hph_etl_airflow/venv/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip -> hdfs://hw001.local:8020/user/airflow/.sparkStaging/application_1572898343646_0029/pyspark.zip
19/12/12 15:22:55 INFO yarn.Client: Uploading resource file:/home/airflow/projects/hph_etl_airflow/venv/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip -> hdfs://hw001.local:8020/user/airflow/.sparkStaging/application_1572898343646_0029/py4j-0.10.7-src.zip
19/12/12 15:22:55 INFO yarn.Client: Uploading resource file:/tmp/spark-4e600acd-2d34-4271-b01c-25f312906f93/__spark_conf__5403285055443058510.zip -> hdfs://hw001.local:8020/user/airflow/.sparkStaging/application_1572898343646_0029/__spark_conf__.zip
19/12/12 15:22:55 INFO spark.SecurityManager: Changing view acls to: airflow
19/12/12 15:22:55 INFO spark.SecurityManager: Changing modify acls to: airflow
19/12/12 15:22:55 INFO spark.SecurityManager: Changing view acls groups to:
19/12/12 15:22:55 INFO spark.SecurityManager: Changing modify acls groups to:
19/12/12 15:22:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(airflow); groups with view permissions: Set(); users with modify permissions: Set(airflow); groups with modify permissions: Set()
19/12/12 15:22:56 INFO yarn.Client: Submitting application application_1572898343646_0029 to ResourceManager
19/12/12 15:22:56 INFO impl.YarnClientImpl: Submitted application application_1572898343646_0029
19/12/12 15:22:56 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1572898343646_0029 and attemptId None
19/12/12 15:22:57 INFO yarn.Client: Application report for application_1572898343646_0029 (state: ACCEPTED)
19/12/12 15:22:57 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1576200176385
final status: UNDEFINED
tracking URL: http://hw001.local:8088/proxy/application_1572898343646_0029/
user: airflow
19/12/12 15:22:58 INFO yarn.Client: Application report for application_1572898343646_0029 (state: FAILED)
19/12/12 15:22:58 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1572898343646_0029 failed 2 times due to AM Container for appattempt_1572898343646_0029_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2019-12-12 15:22:58.214]Exception from container-launch.
Container id: container_e02_1572898343646_0029_02_000001
Exit code: 1
[2019-12-12 15:22:58.215]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/hadoop/yarn/local/usercache/airflow/appcache/application_1572898343646_0029/container_e02_1572898343646_0029_02_000001/launch_container.sh: line 38: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/3.1.0.0-78/hadoop/*:/usr/hdp/3.1.0.0-78/hadoop/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:$PWD/__spark_conf__/__hadoop_conf__: bad substitution
....
Not sure what to make of this or how to proceed further and did not totally understand the error message after googling it.
Anyone with more experience have any further debugging tips for this or fixes?
spark-submit is a bash script, and uses Java classes to run, so using a virtualenv wouldn't necessarily help (although, you can see in the logs that files were uploaded from the environment).
The first error is because hdp-select requires Python2, but it looks like it ran with Python3 (probably due to your venv)
If you want to carry your Python environment to the executors and driver, you'd probably want to use the --pyfiles option instead, or setup the same python environment on each Spark node
Also, you seem to have Spark 2.4.4, not 2.3.2, like you say, which could explain the NoClassDef if you're mixing Spark versions (in particular pyspark from pip doesn't download any scheduler specific packages, like the YARN timeline)
But you ran the code fine and you can find the real exception under
http://hw001.local:8088/proxy/application_1572898343646_0029

Access an HIVE table with pyspark .py file

I get data from an sql table using this code when I run in the pyspark terminal on a GCP machine
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("appName").getOrCreate()
sc = spark.sparkContext
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df= sqlContext.sql('select * from mytable limit 100')
print 'number of rows = ', df.count()
It works when the code is copied and pasted on the pyspark terminal window. But It gives this error when the file is run as .py from terminal.
19/01/21 03:38:43 INFO spark.SparkContext: Running Spark version 2.2.1
19/01/21 03:38:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/01/21 03:38:43 INFO spark.SparkContext: Submitted application: appName
19/01/21 03:38:43 INFO spark.SecurityManager: Changing view acls to: xxxxxxx
19/01/21 03:38:43 INFO spark.SecurityManager: Changing modify acls to: xxxxxxx
19/01/21 03:38:43 INFO spark.SecurityManager: Changing view acls groups to:
19/01/21 03:38:43 INFO spark.SecurityManager: Changing modify acls groups to:
19/01/21 03:38:43 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxxxxxx); groups with view permissions: Set(); users with modify permissions: Set(xxxxxxx); groups with modify permissions: Set()
19/01/21 03:38:44 INFO util.Utils: Successfully started service 'sparkDriver' on port 00000.
19/01/21 03:38:44 INFO spark.SparkEnv: Registering MapOutputTracker
19/01/21 03:38:44 INFO spark.SparkEnv: Registering BlockManagerMaster
19/01/21 03:38:44 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/01/21 03:38:44 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/01/21 03:38:44 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-bdcf00db-e6fc-4a6f-a64d-59def40ca89c
19/01/21 03:38:44 INFO memory.MemoryStore: MemoryStore started with capacity 4.3 GB
19/01/21 03:38:44 INFO spark.SparkEnv: Registering OutputCommitCoordinator
19/01/21 03:38:44 INFO util.log: Logging initialized #3180ms
19/01/21 03:38:44 INFO server.Server: jetty-9.3.z-SNAPSHOT
19/01/21 03:38:44 INFO server.Server: Started #3277ms
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4046. Attempting port 4047.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4047. Attempting port 4048.
19/01/21 03:38:44 WARN util.Utils: Service 'SparkUI' could not bind on port 4048. Attempting port 4049.
19/01/21 03:38:44 INFO server.AbstractConnector: Started ServerConnector#aaa850a{HTTP/1.1,[http/1.1]}{0.0.0.0:0000}
19/01/21 03:38:44 INFO util.Utils: Successfully started service 'SparkUI' on port 0000.
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/jobs,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/jobs/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/jobs/job,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/jobs/job/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/stage,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/stage/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/pool,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/pool/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/storage,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/storage/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/storage/rdd,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/storage/rdd/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/environment,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/environment/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/executors,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/executors/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/executors/threadDump,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/executors/threadDump/json,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/static,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/api,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/jobs/job/kill,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#eqqwe23231q2w{/stages/stage/kill,null,AVAILABLE,#Spark}
19/01/21 03:38:44 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://00.00.00.00:0000
19/01/21 03:38:44 INFO util.Utils: Using initial executors = 8, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
19/01/21 03:38:44 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.6.10-hadoop2
19/01/21 03:38:45 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
19/01/21 03:38:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1
19/01/21 03:38:46 INFO retry.RetryInvocationHandler: Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over rm1 after 1 fail over attempts. Trying to fail over after sleeping for 829ms.
java.net.ConnectException: Call From mytable/ipaddress to mytable:0000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:206)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:156)
at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:156)
at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:155)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 32 more
19/01/21 03:38:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
19/01/21 03:38:46 INFO yarn.Client: Requesting a new application from cluster with 80 NodeManagers
19/01/21 03:38:46 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (45056 MB per container)
19/01/21 03:38:46 INFO yarn.Client: Will allocate AM container, with 24576 MB memory including 2234 MB overhead
19/01/21 03:38:46 INFO yarn.Client: Setting up container launch context for our AM
19/01/21 03:38:46 INFO yarn.Client: Setting up the launch environment for our AM container
19/01/21 03:38:46 INFO yarn.Client: Preparing resources for our AM container
19/01/21 03:38:48 INFO yarn.Client: Uploading resource file:/opt/hadoop/spark/python/lib/pyspark.zip -> hdfs://name-dataproc/user/xxxxxxx/.sparkStaging/application_1547596846411_1167/pyspark.zip
19/01/21 03:38:48 INFO yarn.Client: Uploading resource file:/opt/hadoop/spark/python/lib/py4j-0.10.4-src.zip -> hdfs://name-dataproc/user/xxxxxxx/.sparkStaging/application_1547596846411_1167/py4j-0.10.4-src.zip
19/01/21 03:38:48 INFO yarn.Client: Uploading resource file:/tmp/spark-1c0d417f-4fd6-411a-9480-0fc147d7c9a8/__spark_conf__2865868052747382300.zip -> hdfs://name-dataproc/user/xxxxxxx/.sparkStaging/application_1547596846411_1167/__spark_conf__.zip
19/01/21 03:38:48 INFO spark.SecurityManager: Changing view acls to: xxxxxxx
19/01/21 03:38:48 INFO spark.SecurityManager: Changing modify acls to: xxxxxxx
19/01/21 03:38:48 INFO spark.SecurityManager: Changing view acls groups to:
19/01/21 03:38:48 INFO spark.SecurityManager: Changing modify acls groups to:
19/01/21 03:38:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxxxxxx); groups with view permissions: Set(); users with modify permissions: Set(xxxxxxx); groups with modify permissions: Set()
19/01/21 03:38:48 INFO yarn.Client: Submitting application application_1547596846411_1167 to ResourceManager
19/01/21 03:38:48 INFO impl.YarnClientImpl: Submitted application application_1547596846411_1167
19/01/21 03:38:48 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1547596846411_1167 and attemptId None
19/01/21 03:38:49 INFO yarn.Client: Application report for application_1547596846411_1167 (state: ACCEPTED)
19/01/21 03:38:49 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: long_running
start time: 1548063528733
final status: UNDEFINED
tracking URL: http://name-dataproc-.:0000/proxy/application_1547596846411_1167/
user: xxxxxxx
19/01/21 03:38:50 INFO yarn.Client: Application report for application_1547596846411_1167 (state: ACCEPTED)
19/01/21 03:38:51 INFO yarn.Client: Application report for application_1547596846411_1167 (state: ACCEPTED)
19/01/21 03:38:52 INFO yarn.Client: Application report for application_1547596846411_1167 (state: ACCEPTED)
19/01/21 03:38:52 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
19/01/21 03:38:52 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,
19/01/21 03:38:53 INFO cluster.YarnClientSchedulerBackend: Application application_1547596846411_1167 has started running.
19/01/21 03:38:53 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34040.
19/01/21 03:38:53 INFO netty.NettyBlockTransferService: Server created on 00.000.00.00:23930
19/01/21 03:38:53 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/01/21 03:38:53 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, ip-address, port, None)
19/01/21 03:38:53 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-address:port with 4.3 GB RAM, BlockManagerId(driver, 10.206.52.22, 46766, None)
19/01/21 03:38:53 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ip-address, port, None)
19/01/21 03:38:53 INFO storage.BlockManager: external shuffle service port = 0000
19/01/21 03:38:53 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, ip-address, port, None)
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#dfsdfsdfgs{/metrics/json,null,AVAILABLE,#Spark}
19/01/21 03:38:54 INFO scheduler.EventLoggingListener: Logging events to hdfs://name-dataproc/user/spark/eventlog/application_1547596846411_1167
19/01/21 03:38:54 INFO util.Utils: Using initial executors = 8, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
19/01/21 03:38:54 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
19/01/21 03:38:54 INFO internal.SharedState: loading hive config file: file:/opt/hadoop/conf/hive-site.xml
19/01/21 03:38:54 INFO internal.SharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('gs://place/place/path').
19/01/21 03:38:54 INFO internal.SharedState: Warehouse path is 'gs://place/place/path'.
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#sdfsdgs{/SQL,null,AVAILABLE,#Spark}
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#sdfsdfs{/SQL/json,null,AVAILABLE,#Spark}
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#sdfsdf{/SQL/execution,null,AVAILABLE,#Spark}
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#sdfsdf{/SQL/execution/json,null,AVAILABLE,#Spark}
19/01/21 03:38:54 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#dsfgsdgd{/static/sql,null,AVAILABLE,#Spark}
19/01/21 03:38:55 INFO gcs.GoogleHadoopFileSystemBase: GCS Metadata Cache is enabled: this isn't necessary and in fact is probably detrimental to your job!
19/01/21 03:38:55 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
19/01/21 03:38:55 INFO execution.SparkSqlParser: Parsing command: select * from mytable limit 100
Traceback (most recent call last):
File "/home/xxxxxx/spark_job_example.py", line 8, in <module>
df= sqlContext.sql('select * from mytable limit 100')
File "/opt/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 384, in sql
File "/opt/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 603, in sql
File "/opt/hadoop/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u"Table or view not found: `mytable`.`myttable`; line 1 pos 14;\n'GlobalLimit 100\n+- 'LocalLimit 100\n +- 'Project [*]\n +- 'UnresolvedRelation `mytable`.`table`\n"
19/01/21 03:38:56 INFO spark.SparkContext: Invoking stop() from shutdown hook
19/01/21 03:38:56 INFO server.AbstractConnector: Stopped Spark#fec850a{HTTP/1.1,[http/1.1]}{0.0.0.0:4049}
19/01/21 03:38:56 INFO ui.SparkUI: Stopped Spark web UI at http://10.206.52.22:4049
19/01/21 03:38:56 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
19/01/21 03:38:56 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
19/01/21 03:38:56 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
19/01/21 03:38:56 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
19/01/21 03:38:56 INFO cluster.YarnClientSchedulerBackend: Stopped
19/01/21 03:38:56 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/01/21 03:38:56 INFO memory.MemoryStore: MemoryStore cleared
19/01/21 03:38:56 INFO storage.BlockManager: BlockManager stopped
19/01/21 03:38:56 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
19/01/21 03:38:56 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/01/21 03:38:56 INFO spark.SparkContext: Successfully stopped SparkContext
19/01/21 03:38:56 INFO util.ShutdownHookManager: Shutdown hook called
19/01/21 03:38:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-1c0d417f-4fd6-411a-9480-0fc147d7c9a8
19/01/21 03:38:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-1c0d417f-4fd6-411a-9480-0fc147d7c9a8/pyspark-82d123ce-18ce-43ce-b631-8638bf5ffbfb
I appreciate any help

Spark Job in Client Mode is throwing error

I am trying to run a Spark job in Server. It is not throwing error when I am running any normal println operation. I am unable to understand the error.
I am trying to deploy the code in yarn client mode. Many has said to use chmod 777 in warehouse directories, to disable/enable .enableHiveSupport() but it never works. Need help. I have tried a lot to run and deploy this code in client mode in by spark-submit command, its not working. This code is working like a charm in Eclipse but not by spark-submit. Need help. Thanks.
Code:
package com.issuer.pack3.spark
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.storage.StorageLevel._
import org.apache.spark.sql.hive.HiveContext
object SparkApplication3 {
def main(args:Array[String])
{
val warehouseLocation = "/hadoop/spark-2.2.1/spark-warehouse"
val sparksessionobject = SparkSession
.builder()
.master("local[*]")
.appName("SparkSession1")
.config("spark.sql.warehouse.dir", warehouseLocation)
.config("spark.executor.memory", "10g")
.config("spark.driver.memory","10g")
.config("spark.sql.shuffle.partitions", "10000")
.config("spark.driver.maxResultSize", "200g")
.config("spark.memory.offHeap.enabled", "true")
.config("spark.memory.offHeap.size", "200g")
.config("spark.debug.maxToStringFields", "100")
// .enableHiveSupport()
.getOrCreate()
val joined_acc_custinfo_trips = sparksessionobject.sqlContext.read.format("csv")
.option("header", "true")
.option("inferSchema", "true").load("/home/user/input/part-00000.csv")
joined_acc_custinfo_trips.registerTempTable("joined_acc_custinfo_trips")
val query9 = "--SQL QUERY IS HERE--"
val res06 = sparksessionobject.sqlContext.sql(query9.toString)
res06.repartition(1).write.json("/hadoop/OP/part1/")
println("-------------------------------------END OF FIRST STAGE-------------------------------------------")
println("-------------------------------------END OF FIRST STAGE-------------------------------------------")
println("-------------------------------------END OF FIRST STAGE-------------------------------------------")
println("-------------------------------------END OF FIRST STAGE-------------------------------------------")
println("-------------------------------------END OF FIRST STAGE-------------------------------------------")
}
}
Error:
[user#Analytic ~]$ spark-submit --master yarn --deploy-mode client --class "com.issuer.pack3.spark.SparkApplication3" /home/user/app2.jar
18/08/20 17:25:21 INFO spark.SparkContext: Running Spark version 2.2.1
18/08/20 17:25:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/20 17:25:21 INFO spark.SparkContext: Submitted application: SparkSession1
18/08/20 17:25:21 INFO spark.SecurityManager: Changing view acls to: bhaskar
18/08/20 17:25:21 INFO spark.SecurityManager: Changing modify acls to: bhaskar
18/08/20 17:25:21 INFO spark.SecurityManager: Changing view acls groups to:
18/08/20 17:25:21 INFO spark.SecurityManager: Changing modify acls groups to:
18/08/20 17:25:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(bhaskar); groups with view permissions: Set(); users with modify permissions: Set(bhaskar); groups with modify permissions: Set()
18/08/20 17:25:22 INFO util.Utils: Successfully started service 'sparkDriver' on port 40090.
18/08/20 17:25:22 INFO spark.SparkEnv: Registering MapOutputTracker
18/08/20 17:25:22 INFO spark.SparkEnv: Registering BlockManagerMaster
18/08/20 17:25:22 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/08/20 17:25:22 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/08/20 17:25:22 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-2af112f6-030a-4f01-91fc-8e69f1281fde
18/08/20 17:25:22 INFO memory.MemoryStore: MemoryStore started with capacity 200.4 GB
18/08/20 17:25:22 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/08/20 17:25:22 INFO util.log: Logging initialized #1480ms
18/08/20 17:25:22 INFO server.Server: jetty-9.3.z-SNAPSHOT
18/08/20 17:25:22 INFO server.Server: Started #1542ms
18/08/20 17:25:22 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
18/08/20 17:25:22 INFO server.AbstractConnector: Started ServerConnector#5cad8b7d{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
18/08/20 17:25:22 INFO util.Utils: Successfully started service 'SparkUI' on port 4041.
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#492fc69e{/jobs,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#6d2260db{/jobs/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#49bf29c6{/jobs/job,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#7668d560{/jobs/job/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#126be319{/stages,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#5c371e13{/stages/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#1e34c607{/stages/stage,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#9257031{/stages/stage/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#7726e185{/stages/pool,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#282308c3{/stages/pool/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#1db0ec27{/storage,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#d4ab71a{/storage/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#1af05b03{/storage/rdd,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#1ad777f{/storage/rdd/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#438bad7c{/environment,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4fdf8f12{/environment/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#54f5f647{/executors,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#5a6d5a8f{/executors/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#315ba14a{/executors/threadDump,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#27f0ad19{/executors/threadDump/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#38d5b107{/static,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#77e2a6e2{/,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#199e4c2b{/api,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#2c1dc8e{/jobs/job/kill,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4e7095ac{/stages/stage/kill,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.70.13:4041
18/08/20 17:25:22 INFO spark.SparkContext: Added JAR file:/home/bhaskar/app2.jar at spark://192.168.70.13:40090/jars/app2.jar with timestamp 1534766122528
18/08/20 17:25:22 INFO executor.Executor: Starting executor ID driver on host localhost
18/08/20 17:25:22 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41973.
18/08/20 17:25:22 INFO netty.NettyBlockTransferService: Server created on 192.168.70.13:41973
18/08/20 17:25:22 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/08/20 17:25:22 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.70.13, 41973, None)
18/08/20 17:25:22 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.70.13:41973 with 200.4 GB RAM, BlockManagerId(driver, 192.168.70.13, 41973, None)
18/08/20 17:25:22 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.70.13, 41973, None)
18/08/20 17:25:22 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.70.13, 41973, None)
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#5560bcdf{/metrics/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO internal.SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('/hadoop/spark-2.2.1/spark-warehouse').
18/08/20 17:25:22 INFO internal.SharedState: Warehouse path is '/hadoop/spark-2.2.1/spark-warehouse'.
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#4c98a6d5{/SQL,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#7f02251{/SQL/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#1bcf67e8{/SQL/execution,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#53692008{/SQL/execution/json,null,AVAILABLE,#Spark}
18/08/20 17:25:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#3a4ba480{/static/sql,null,AVAILABLE,#Spark}
18/08/20 17:25:23 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/08/20 17:25:23 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/08/20 17:25:23 INFO metastore.ObjectStore: ObjectStore, initialize called
18/08/20 17:25:23 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/08/20 17:25:23 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
18/08/20 17:25:25 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
18/08/20 17:25:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/08/20 17:25:25 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/08/20 17:25:26 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
18/08/20 17:25:26 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
18/08/20 17:25:26 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/08/20 17:25:26 INFO metastore.ObjectStore: Initialized ObjectStore
18/08/20 17:25:26 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/08/20 17:25:26 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
18/08/20 17:25:26 INFO metastore.HiveMetaStore: Added admin role in metastore
18/08/20 17:25:26 INFO metastore.HiveMetaStore: Added public role in metastore
18/08/20 17:25:26 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
18/08/20 17:25:26 INFO metastore.HiveMetaStore: 0: get_all_databases
18/08/20 17:25:26 INFO HiveMetaStore.audit: ugi=bhaskar ip=unknown-ip-addr cmd=get_all_databases
18/08/20 17:25:27 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
18/08/20 17:25:27 INFO HiveMetaStore.audit: ugi=bhaskar ip=unknown-ip-addr cmd=get_functions: db=default pat=*
18/08/20 17:25:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
Exception in thread "main" java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
at org.apache.spark.sql.DataFrameReader.<init>(DataFrameReader.scala:689)
at org.apache.spark.sql.SparkSession.read(SparkSession.scala:645)
at org.apache.spark.sql.SQLContext.read(SQLContext.scala:504)
at com.issuer.pack3.spark.SparkApplication3$.main(SparkApplication3.scala:58)
at com.issuer.pack3.spark.SparkApplication3.main(SparkApplication3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.net.ConnectException: Call From Analytic/192.168.70.13 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:105)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:93)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
... 19 more
Caused by: java.lang.RuntimeException: java.net.ConnectException: Call From Analytic/192.168.70.13 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:191)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
... 28 more
Caused by: java.net.ConnectException: Call From Analytic/192.168.70.13 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy20.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:596)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 42 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 62 more
18/08/20 17:25:27 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/08/20 17:25:27 INFO server.AbstractConnector: Stopped Spark#5cad8b7d{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
18/08/20 17:25:27 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.70.13:4041
18/08/20 17:25:27 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/08/20 17:25:27 INFO memory.MemoryStore: MemoryStore cleared
18/08/20 17:25:27 INFO storage.BlockManager: BlockManager stopped
18/08/20 17:25:27 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
18/08/20 17:25:27 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/08/20 17:25:27 INFO spark.SparkContext: Successfully stopped SparkContext
18/08/20 17:25:27 INFO util.ShutdownHookManager: Shutdown hook called
18/08/20 17:25:27 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-739175a5-4864-4fd0-8e1d-a22ff371f821
[user#Analytic ~]$
Please use below spark config while creating the spark session.
.config("hive.metastore.uris", "placeyoururi")
or you can pass the hive-site.xml as below in the spark-submit.
--files configpath/hive-site.xml
Also use .enableHiveSupport() for both approaches.
Hope it helps you.

Error initializing SparkContext., Containers logs: ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 1 5: SIGTERM End of LogType:stderr

When I start the spark-yarn using this command "spark-shell --master yarn-client" Im getting an error saying:
ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
The full error I got in starting spark shell with yarn is below, the logs about yarn containers is here:
Container: container_1463670715317_0002_01_000001 on masternode_52694
============================================================================
LogType:stderr
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:5748
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hadoopadmin/nm-local-dir/usercache /hadoopadmin/filecache/13/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/S taticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.1/share/hadoop/common/li b/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/19 16:19:44 INFO yarn.ApplicationMaster: Registered signal handlers for [T ERM, HUP, INT]
16/05/19 16:19:45 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_ 1463670715317_0002_000001
16/05/19 16:19:46 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:46 INFO spark.SecurityManager: Changing modify acls to: hadoopadm in
16/05/19 16:19:46 INFO spark.SecurityManager: SecurityManager: authentication di sabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users w ith modify permissions: Set(hadoopadmin)
16/05/19 16:19:46 INFO yarn.ApplicationMaster: Waiting for Spark driver to be re achable.
16/05/19 16:19:46 INFO yarn.ApplicationMaster: Driver now available: 10.17.0.50: 43771
16/05/19 16:19:47 INFO yarn.ApplicationMaster$AMEndpoint: Add WebUI Filter. AddW ebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_ HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/a pplication_1463670715317_0002),/proxy/application_1463670715317_0002)
16/05/19 16:19:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0 :8030
16/05/19 16:19:47 INFO yarn.YarnRMClient: Registering the ApplicationMaster
16/05/19 16:19:47 INFO yarn.YarnAllocator: Will request 2 executor containers, e ach with 1 cores and 1408 MB memory including 384 MB overhead
16/05/19 16:19:47 INFO yarn.YarnAllocator: Container request (host: Any, capabil ity: <memory:1408, vCores:1>)
16/05/19 16:19:47 INFO yarn.YarnAllocator: Container request (host: Any, capabil ity: <memory:1408, vCores:1>)
16/05/19 16:19:47 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
16/05/19 16:19:47 INFO impl.AMRMClientImpl: Received new token for : masternode:52694
16/05/19 16:19:47 INFO yarn.YarnAllocator: Launching container container_1463670 715317_0002_01_000002 for on host masternode
16/05/19 16:19:47 INFO yarn.YarnAllocator: Launching ExecutorRunnable. driverUrl : spark://CoarseGrainedScheduler#10.17.0.50:43771, executorHostname: masternode
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Starting Executor Container
16/05/19 16:19:47 INFO yarn.YarnAllocator: Received 1 containers from YARN, laun ching executors on 1 of them.
16/05/19 16:19:47 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-ca ched-nodemanagers-proxies : 0
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Preparing Local resources
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Prepared Local resources Map(_spa rk_.jar -> resource
{ scheme: "hdfs" host: "localhost" port: 9000 file: "/user/ hadoopadmin/.sparkStaging/application_1463670715317_0002/spark-assembly-1.6.1-ha doop2.6.0.jar" }
size: 187698038 timestamp: 1463671182405 type: FILE visibility: PRIVATE)
16/05/19 16:19:48 INFO yarn.ExecutorRunnable:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> PWD<CPS>PWD/_spark_.jar<CPS>$HADOOP_CONF_DIR<CPS>$HAD OOP_COMMON_HOME/share/hadoop/common/<CPS>$HADOOP_COMMON_HOME/share/hadoop/commo n/lib/<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/<CPS>$HADOOP_HDFS_HOME/share/ha doop/hdfs/lib/<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/<CPS>$HADOOP_YARN_HOME/ share/hadoop/yarn/lib/<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/<CPS>$HA DOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/
SPARK_LOG_URL_STDERR -> http://masternode:8042/node/containerlogs/conta iner_1463670715317_0002_01_000002/hadoopadmin/stderr?start=-4096
SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1463670715317_0002
SPARK_YARN_CACHE_FILES_FILE_SIZES -> 187698038
SPARK_USER -> hadoopadmin
SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE
SPARK_YARN_MODE -> true
SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1463671182405
SPARK_LOG_URL_STDOUT -> http://masternode:8042/node/containerlogs/conta iner_1463670715317_0002_01_000002/hadoopadmin/stdout?start=-4096
SPARK_YARN_CACHE_FILES -> hdfs://localhost:9000/user/hadoopadmin/.sparkStagi ng/application_1463670715317_0002/spark-assembly-1.6.1-hadoop2.6.0.jar#_spark_ .jar
command:
JAVA_HOME/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -X mx1024m -Djava.io.tmpdir=PWD/tmp '-Dspark.driver.port=43771' -Dspark.yarn.ap p.container.log.dir=<LOG_DIR> org.apache.spark.executor.CoarseGrainedExecutorBac kend --driver-url spark://CoarseGrainedScheduler#10.17.0.50:43771 --executor-id 1 --hostname masternode --cores 1 --app-id application_1463670715317_0002 - -user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
===============================================================================
16/05/19 16:19:48 INFO impl.ContainerManagementProtocolProxy: Opening proxy : masternode:52694
16/05/19 16:19:48 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
16/05/19 16:19:48 INFO yarn.ApplicationMaster: Final app status: UNDEFINED, exit Code: 0, (reason: Shutdown hook called before final status was reported.)
16/05/19 16:19:48 INFO util.ShutdownHookManager: Shutdown hook called
End of LogType:stderr
LogType:stdout
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:0
Log Contents:
End of LogType:stdout
Container: container_1463670715317_0002_02_000002 on masternode_52694
============================================================================
LogType:stderr
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:737
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hadoopadmin/nm-local-dir/usercache /hadoopadmin/filecache/13/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/S taticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.1/share/hadoop/common/li b/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/19 16:19:54 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
16/05/19 16:19:54 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 1 5: SIGTERM
End of LogType:stderr
LogType:stdout
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:0
Log Contents:
End of LogType:stdout
hadoopadmin#master:~$
The full error that it shows when I try to start spark with "spark-shell --master yarn-client":
hadoopadmin#master:~$ spark-shell --master yarn-client
16/05/19 16:19:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/19 16:19:33 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:33 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin
16/05/19 16:19:33 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin)
16/05/19 16:19:33 INFO spark.HttpServer: Starting HTTP Server
16/05/19 16:19:33 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/19 16:19:33 INFO server.AbstractConnector: Started SocketConnector#0.0.0.0:37052
16/05/19 16:19:33 INFO util.Utils: Successfully started service 'HTTP class server' on port 37052.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
16/05/19 16:19:37 INFO spark.SparkContext: Running Spark version 1.6.1
16/05/19 16:19:37 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:37 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin
16/05/19 16:19:37 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin)
16/05/19 16:19:38 INFO util.Utils: Successfully started service 'sparkDriver' on port 43771.
16/05/19 16:19:38 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/05/19 16:19:38 INFO Remoting: Starting remoting
16/05/19 16:19:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#10.17.0.50:57722]
16/05/19 16:19:38 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 57722.
16/05/19 16:19:38 INFO spark.SparkEnv: Registering MapOutputTracker
16/05/19 16:19:38 INFO spark.SparkEnv: Registering BlockManagerMaster
16/05/19 16:19:38 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-e8de3854-2526-4725-8c73-edb3fce2df33
16/05/19 16:19:38 INFO storage.MemoryStore: MemoryStore started with capacity 511.1 MB
16/05/19 16:19:38 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/05/19 16:19:39 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/19 16:19:39 INFO server.AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
16/05/19 16:19:39 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/05/19 16:19:39 INFO ui.SparkUI: Started SparkUI at http://10.17.0.50:4040
16/05/19 16:19:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/05/19 16:19:39 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/05/19 16:19:39 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/05/19 16:19:39 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/05/19 16:19:39 INFO yarn.Client: Setting up container launch context for our AM
16/05/19 16:19:39 INFO yarn.Client: Setting up the launch environment for our AM container
16/05/19 16:19:39 INFO yarn.Client: Preparing resources for our AM container
16/05/19 16:19:40 INFO yarn.Client: Uploading resource file:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar -> hdfs://localhost:9000/user/hadoopadmin/.sparkStaging/application_1463670715317_0002/spark-assembly-1.6.1-hadoop2.6.0.jar
16/05/19 16:19:42 INFO yarn.Client: Uploading resource file:/tmp/spark-942afe6a-95ca-4b8b-b06f-e9e3ac6aa751/__spark_conf__5009784131719458516.zip -> hdfs://localhost:9000/user/hadoopadmin/.sparkStaging/application_1463670715317_0002/__spark_conf__5009784131719458516.zip
16/05/19 16:19:42 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:42 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin
16/05/19 16:19:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin)
16/05/19 16:19:42 INFO yarn.Client: Submitting application 2 to ResourceManager
16/05/19 16:19:42 INFO impl.YarnClientImpl: Submitted application application_1463670715317_0002
16/05/19 16:19:43 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:43 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1463671182634
final status: UNDEFINED
tracking URL: http://masternode:8088/proxy/application_1463670715317_0002/
user: hadoopadmin
16/05/19 16:19:44 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:45 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:46 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:47 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/05/19 16:19:47 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/application_1463670715317_0002), /proxy/application_1463670715317_0002
16/05/19 16:19:47 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/05/19 16:19:47 INFO yarn.Client: Application report for application_1463670715317_0002 (state: RUNNING)
16/05/19 16:19:47 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.17.0.50
ApplicationMaster RPC port: 0
queue: default
start time: 1463671182634
final status: UNDEFINED
tracking URL: http://masternode:8088/proxy/application_1463670715317_0002/
user: hadoopadmin
16/05/19 16:19:47 INFO cluster.YarnClientSchedulerBackend: Application application_1463670715317_0002 has started running.
16/05/19 16:19:47 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49183.
16/05/19 16:19:47 INFO netty.NettyBlockTransferService: Server created on 49183
16/05/19 16:19:47 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/05/19 16:19:47 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.17.0.50:49183 with 511.1 MB RAM, BlockManagerId(driver, 10.17.0.50, 49183)
16/05/19 16:19:47 INFO storage.BlockManagerMaster: Registered BlockManager
16/05/19 16:19:51 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/05/19 16:19:51 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/application_1463670715317_0002), /proxy/application_1463670715317_0002
16/05/19 16:19:51 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/05/19 16:19:54 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
16/05/19 16:19:54 INFO ui.SparkUI: Stopped Spark web UI at http://10.17.0.50:4040
16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Stopped
16/05/19 16:19:54 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/05/19 16:19:54 INFO storage.MemoryStore: MemoryStore cleared
16/05/19 16:19:54 INFO storage.BlockManager: BlockManager stopped
16/05/19 16:19:54 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
16/05/19 16:19:54 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/05/19 16:19:54 INFO spark.SparkContext: Successfully stopped SparkContext
16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/05/19 16:20:09 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
16/05/19 16:20:09 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:584)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
at $line3.$read$$iwC$$iwC.<init>(<console>:15)
at $line3.$read$$iwC.<init>(<console>:24)
at $line3.$read.<init>(<console>:26)
at $line3.$read$.<init>(<console>:30)
at $line3.$read$.<clinit>(<console>)
at $line3.$eval$.<init>(<console>:7)
at $line3.$eval$.<clinit>(<console>)
at $line3.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/05/19 16:20:09 INFO spark.SparkContext: SparkContext already stopped.
java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:584)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
at $iwC$$iwC.<init>(<console>:15)
at $iwC.<init>(<console>:24)
at <init>(<console>:26)
at .<init>(<console>:30)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at ... org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
<console>:16: error: not found: value sqlContext
import sqlContext.implicits._
^
<console>:16: error: not found: value sqlContext
import sqlContext.sql
^
Something exceeded it's memory budget. No helpful errors but that's what it was for me. Try upping various parameters like MaxPermSize and memoryOverhead.
https://mail-archives.apache.org/mod_mbox/spark-user/201507.mbox/%3c55A372C5.9050801#googlemail.com%3e

Resources