Error running Spark 3.1.1 as hive's 3.1.2 engine ( java.lang.NoClassDefFoundError: org/apache/spark/unsafe/array/ByteArrayMethods ) - apache-spark

I am running spark on top of yarn on ubuntu 20.4
cluster versions :
Hadoop 3.2.2
Hive 3.1.2
Spark 3.1.1
i have given the symlink from spark's jar to hive's lib as :
sudo ln -s $SPARK_HOME/jars/spark-network-common_2.12-3.1.1.jar $HIVE_HOME/lib/spark-network-common_2.12-3.1.1.jar
sudo ln -s $SPARK_HOME/jars/spark-core_2.12-3.1.1.jar $HIVE_HOME/lib/spark-core_2.12-3.1.1.jar
sudo ln -s $SPARK_HOME/jars/scala-library-2.12.10.jar $HIVE_HOME/lib/scala-library-2.12.10.jar
and when running hive and setting spark as it's enging i get the following error :
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
2021-05-31T12:31:58,949 ERROR [a69d446a-f1a0-45d9-8dbc-c0fccbf718b3 main] spark.SparkTask: Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:221)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:92)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115)
at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136)
at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:115)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/unsafe/array/ByteArrayMethods
at org.apache.spark.internal.config.package$.<init>(package.scala:1095)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:654)
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
at org.apache.spark.SparkConf.set(SparkConf.scala:94)
at org.apache.spark.SparkConf.set(SparkConf.scala:83)
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:265)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:98)
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:76)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:87)
... 24 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.unsafe.array.ByteArrayMethods
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 34 more
2021-05-31T12:31:58,950 ERROR [a69d446a-f1a0-45d9-8dbc-c0fccbf718b3 main] spark.SparkTask: Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:221) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:92) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:115) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) ~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) ~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) ~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) ~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) ~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) ~[hive-cli-3.1.2.jar:3.1.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_292]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_292]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_292]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
at org.apache.hadoop.util.RunJar.run(RunJar.java:323) ~[hadoop-common-3.2.2.jar:?]
at org.apache.hadoop.util.RunJar.main(RunJar.java:236) ~[hadoop-common-3.2.2.jar:?]
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/unsafe/array/ByteArrayMethods
at org.apache.spark.internal.config.package$.<init>(package.scala:1095) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.spark.internal.config.package$.<clinit>(package.scala) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:654) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.spark.SparkConf.set(SparkConf.scala:94) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.spark.SparkConf.set(SparkConf.scala:83) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:265) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:98) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:76) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:87) ~[hive-exec-3.1.2.jar:3.1.2]
... 24 more
i have downloaded spark as a prebuilt for hadoop 3.2.0 and later in which spark jars is containing hive 2.3.0 jars and hive is of 3.1.2 in which hive's lib contains 3.1.2 jars

Hive on Spark is only tested with a specific version of Spark, so a given version of Hive is only guaranteed to work with a specific version of Spark. Other versions of Spark may work with a given version of Hive, but that is not guaranteed. Below is a list of Hive versions and their corresponding compatible Spark versions.
Please find the information in Hive Spark Compatibility Chart.
Please use spark 2.3.0 and take a look at POM.xml which includes the below on itself.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-unsafe_2.11</artifactId>
<version>2.3.0</version>
<scope>compile</scope>
</dependency>

Related

spark s3 shuffle service failing to fetch blocks

I am using the spark s3 shuffle service from AWS on a spark standalone cluster
spark version = 3.3.0
java version = 1.8 corretto
The following two options have been added to my spark submit
spark.shuffle.sort.io.plugin.class=com.amazonaws.spark.shuffle.io.cloud.ChopperPlugin \
spark.shuffle.storage.path=s3a://<bucket>/shuffle.tmp \
However, when running jobs, I see the following errors on my executors
23/01/02 13:27:23 ERROR ShuffleBlockFetcherIterator: Failed to get block(s) from <IP>:<port>
java.lang.RuntimeException: java.nio.file.NoSuchFileException: /tmp/spark/spark-ccfc4d1a-ed5f-4510-8045-ab33bf7de2c1/executor-e867d208-4235-4d87-a05d-35abc431794f/blockmgr-a4c2d1c9-399d-482b-b1bb-0112caba7d88/32/shuffle_0_19_0.index
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:582)
at org.apache.spark.storage.BlockManager.getLocalBlockData(BlockManager.scala:694)
at org.apache.spark.network.netty.NettyBlockRpcServer.$anonfun$receive$8(NettyBlockRpcServer.scala:91)
at org.apache.spark.network.netty.NettyBlockRpcServer.$anonfun$receive$8$adapted(NettyBlockRpcServer.scala:89)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofInt.foreach(ArrayOps.scala:246)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.mutable.ArrayOps$ofInt.map(ArrayOps.scala:246)
at org.apache.spark.network.netty.NettyBlockRpcServer.$anonfun$receive$7(NettyBlockRpcServer.scala:89)
at org.apache.spark.network.netty.NettyBlockRpcServer.$anonfun$receive$7$adapted(NettyBlockRpcServer.scala:87)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)

NoClassDefFoundError: org/apache/spark/AccumulatorParam with spark execution engine on hive

Trying to use SPARK as hive execution engine, Getting below error
Job failed with java.lang.ClassNotFoundException: org.apache.spark.AccumulatorParam
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.lang.NoClassDefFoundError: org/apache/spark/AccumulatorParam
at org.apache.hive.spark.counter.SparkCounterGroup.createCounter(SparkCounterGroup.java:52)
at org.apache.hive.spark.counter.SparkCounters.createCounter(SparkCounters.java:71)
at org.apache.hive.spark.counter.SparkCounters.createCounter(SparkCounters.java:67)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:337)
at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:358)
at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.AccumulatorParam
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
Spark versions : 3.1.1
Hive version : 3.1.2 (also tried with - version2.3.8 got same error though)
Hadoop version : 3.3.0

Spark Structured Streaming :: Unexpected error:: STATUS_INVALID_HANDLE with path=""

I've Spark (2.4.4) Structure Streaming Job on Hortonworks (2.6.4), where I am reading messages from kafka topic , after schema validation streaming job is storing those messages into HBASE & HIVE.
After 6-7 hours of execution the Job dies because of STATUS_INVALID_HANDLE .
If I remove Hive Details, there is no such exception in the process.
The Path Mentioned below is already existing in HDFS.
Any Help on resolving this issue ??
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unexpected error: STATUS_INVALID_HANDLE with path="/dev/projects/spark-checkpoint/hive/BLR_TOPIC_1-cash_blr_db_cash_streax_blr_table/offsets/.287.b53c5d5e-7f59-4aec-a6a7-015813d44b43.tmp", permission=666, clientname=DFSClient_NONMAPREDUCE_237312562_34
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy10.create(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
at sun.reflect.GeneratedMethodAccessor147.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy11.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648)
at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1750)
at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:102)
at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:58)
at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:584)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:686)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:682)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.create(FileContext.java:688)
at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createTempFile(CheckpointFileManager.scala:311)
at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:133)
at org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.<init>(CheckpointFileManager.scala:136)
at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.createAtomic(CheckpointFileManager.scala:318)
at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatchToFile(HDFSMetadataLog.scala:123)
at org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply$mcZ$sp(HDFSMetadataLog.scala:112)
at org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
at org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply$mcV$sp(MicroBatchExecution.scala:382)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcZ$sp(MicroBatchExecution.scala:381)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch(MicroBatchExecution.scala:337)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:183)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281) ```
We tried Running the same on couple of spark versions like 2.2.0 , 2.4.0 , 2.4.6.
Spark 2.4.6 serve my purpose , I'm not seeing STATUS_INVALID_HANDLE exception.

Unable to instantiate SparkSession

I try to configure hadoop cluster with spark and hive.
I use Hadoop 3.1.3 version with Hive 3.1.3 and prebuild Spark 3.0.0 with user-provided hadoop.
When I deploy all, I shouldn't connect to hive metastore from spark. So, I try this and get following
./bin/run-example sql.hive.SparkHiveExample
Output:
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:871)
at org.apache.spark.examples.sql.hive.SparkHiveExample$.main(SparkHiveExample.scala:47)
at org.apache.spark.examples.sql.hive.SparkHiveExample.main(SparkHiveExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/07/06 15:26:27 INFO ShutdownHookManager: Shutdown hook called
20/07/06 15:26:28 INFO ShutdownHookManager: Deleting directory /tmp/spark-246ae724-072f-45b1-9c89-cf122203a3df
I set classpath like here
What I should do for a successful link spark and hive?

Spark 1.1.0 on cdh5.1.3 does not work in yarn-cluster mode

I am having CDH 5.1 (Hadoop 2.3.0-cdh5.1.3) installed on my cluster, version:
I have installed and configured a prebuilt version of Spark 1.1.0 (Apache Version), built for hadoop 2.3 on my cluster.
when I run the Pi example in the ‘client mode’, it runs successfully, but it fails in the ‘yarn-cluster’ mode. The spark job is successfully submitted, but fails after polling the application master for sometime:
More Logs:
Application application_1415193640322_0016 failed 2 times due to Error launching appattempt_1415193640322_0016_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: java.io.EOFException
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:710)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)
at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readUTF(DataInputStream.java:609)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:151)
at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:142)
at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:262)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:696)
... 10 more
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:99)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnException): java.io.EOFException
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:710)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)
at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readUTF(DataInputStream.java:609)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at org.apache.hadoop.yarn.security.ContainerTokenIdentifier.readFields(ContainerTokenIdentifier.java:151)
at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:142)
at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:262)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:696)
... 10 more
at org.apache.hadoop.ipc.Client.call(Client.java:1409)
at org.apache.hadoop.ipc.Client.call(Client.java:1362)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy69.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
... 5 more
. Failing the application.
When I go to node Manager logs:
Log Type: stderr
Log Length: 87
Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher
Can you please suggest any solution.Do you think I should compile the spark code on my cluster. Or should I use Spark provided with CDH5.1.
Any help will be appreciated!
spark-shell does not work with spark yarn-cluster mode. You should add --master yarn-client
Example:
path/to/pyspark --master yarn-client

Resources