Spark error : java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamWriteSupport - apache-spark

I am using Spark in Horton works, when i execute the below code i am getting exception. i also have a separate spark instance running in my system - same code is working fine in it.
I need to do anything different in Horton works to resolve the below error?.kindly assist me.
[root#sandbox-hdp bin]# ./spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/08/31 11:36:44 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context Web UI available at http://172.17.0.2:4041
Spark context available as 'sc' (master = local[*], app id = local-1535715404685).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0.2.6.3.0-235
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :paste
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load("hdfs://sandbox-hdp.hortonworks.com:8020/city.csv")
df.show()
df.printSchema()
// Exiting paste mode, now interpreting.
warning: there was one deprecation warning; re-run with -deprecation for details
java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamWriteSupport
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:533)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:89)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:89)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:304)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
... 53 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.StreamWriteSupport
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 86 more
All the CSV drivers are properly copied to jar folder inside spark. This is the same exception i am facing to connect to Excel, Hbase and apache phoenix.
I am facing this only in horton works.

After adding Latest Spark code Jar it worked

I had the same question.I found a mistake in my pom.xml that my spark version is diffrent with my environment. So, I changed the spark version and kept the same environment and solved this error.

Related

spark-shell exception org.apache.spark.SparkException: Exception thrown in awaitResult

Facing below error while starting spark-shell with yarn master. Shell is working with spark local master.
admin#XXXXXX:~$ spark-shell --master yarn 21/11/03 15:51:51 WARN Utils: Your hostname, XXXXXX resolves to a loopback address:
127.0.1.1; using 192.168.29.57 instead (on interface wifi0) 21/11/03 15:51:51 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/11/03 15:52:01 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. Spark context Web UI available at http://XX.XX.XX.XX:4040 Spark context available as 'sc' (master = yarn, app id = application_1635934709971_0001). Spark session available as 'spark'. Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.5
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_301) Type in expressions to have them evaluated. Type :help for more information.
scala>
scala> 21/11/03 15:52:35 ERROR YarnClientSchedulerBackend: YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details. 21/11/03 15:52:35 ERROR YarnClientSchedulerBackend: Diagnostics message: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:515)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:307)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:780)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:779)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:804)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:834)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) Caused by: java.io.IOException: Failed to connect to /192.168.29.57:33333
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /192.168.29.57:33333 Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:688)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
21/11/03 15:52:35 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
Below is spark-defaults.conf
spark.driver.memory 512m
spark.yarn.am.memory 512m
spark.executor.memory 512m
spark.eventLog.enabled true
spark.eventLog.dir file:////home/admin/spark_event_temp
spark.history.fs.logDirectory hdfs://localhost:9000/spark-logs
spark.history.fs.update.interval 10s
spark.history.ui.port 18080
spark.sql.warehouse.dir=file:////home/admin/spark_warehouse
spark.shuffle.service.port 7337
spark.ui.port 4040
spark.blockManager.port 31111
spark.driver.blockManager.port 32222
spark.driver.port 33333
spark version:- spark-2.4.5-bin-hadoop2.7
hadoop version:- hadoop-2.8.5
I can provide more information if needed. I have configured everything in the local machine.
Adding these properties in spark-env.sh fixed the issue for me.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/mnt/d/soft/hadoop-2.8.5
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=$SPARK_HOME
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop/
export SPARK_MASTER_HOST=127.0.0.1
export SPARK_LOCAL_IP=127.0.0.1

Issue with spark-shell command: java.io.IOException: Could not create FileClient

I have some issues when a executing spark-shell command:
[mapr#node1 ~]$ /opt/mapr/spark/spark-2.1.0/bin/spark-shell --master local[2]
Error:
20/05/02 14:21:34 ERROR SparkContext: Error initializing SparkContext.
java.io.IOException: Could not create FileClient
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:643)
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:696)
at com.mapr.fs.MapRFileSystem.getMapRFileStatus(MapRFileSystem.java:1405)
at com.mapr.fs.MapRFileSystem.getFileStatus(MapRFileSystem.java:1080)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:93)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:531)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
at $line3.$read$$iw$$iw.<init>(<console>:15)
at $line3.$read$$iw.<init>(<console>:42)
at $line3.$read.<init>(<console>:44)
at $line3.$read$.<init>(<console>:48)
at $line3.$read$.<clinit>(<console>)
at $line3.$eval$.$print$lzycompute(<console>:7)
at $line3.$eval$.$print(<console>:6)
at $line3.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)
at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:105)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
at org.apache.spark.repl.Main$.doMain(Main.scala:68)
at org.apache.spark.repl.Main$.main(Main.scala:51)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:733)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:177)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:202)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:116)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Could not create FileClient
at com.mapr.fs.MapRClientImpl.<init>(MapRClientImpl.java:136)
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:637)
... 58 more
java.io.IOException: Could not create FileClient
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:643)
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:696)
at com.mapr.fs.MapRFileSystem.getMapRFileStatus(MapRFileSystem.java:1405)
at com.mapr.fs.MapRFileSystem.getFileStatus(MapRFileSystem.java:1080)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:93)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:531)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
... 47 elided
Caused by: java.io.IOException: Could not create FileClient
at com.mapr.fs.MapRClientImpl.<init>(MapRClientImpl.java:136)
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:637)
... 58 more
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
^
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0-mapr-1710
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
Type in expressions to have them evaluated.
Type :help for more information.
scala> 2020-05-02 14:21:27,8616 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:2470 Thread: 30539 MoveToNextCldb: No CLDB entries, cannot run, sleeping 5 seconds!
2020-05-02 14:21:32,9268 ERROR Client fs/client/fileclient/cc/client.cc:1329 Thread: 30539 Failed to initialize client for cluster MyCluster, error Connection reset by peer(104)
I use Spark-2.1.0 and a cluster MapR with 3 nodes.
I also have the following conf files:
Slaves with a list of nodes:
# A Spark Worker will be started on each of the machines listed below.
node2
node3
Also add these following lines to $Spark_HOME/conf/spark-env.sh:
export SPARK_MASTER_HOST=node1
export SPARK_MASTER_IP=172.17.0.2
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
Please did any one had the same issue or know how to fix it.
Seems like this is typical installation issue. From Mapr forums I found, Please have a look at spark-on-yarn-errors- they proposed to install commands in the link as well as run configure.sh
Gist of this is (source mapr forums)...Here is the solution:
-pip uninstall toree
-pip install --pre toree
-jupyter toree install --interpreters=Scala,PySparK,SparK,SQL --spark_home=$SPARK_HOME
Also, Spark docs says for spark 2.1.0 compatible version is scala 2.11.8
Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API, Spark 2.1.0 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).
Hope you have not missed these versions.

unable to start ./spark-shell

i am new to spark an dtrying to install spark on my ubuntu machine and
i am facing below error when am i trying to start spark -shell
please help
Caused by: org.apache.derby.iapi.error.StandardException: Directory /home/amar/spark-2.2.0-bin-hadoop2.7/bin/metastore_db cannot be created.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.services.monitor.StorageFactoryService$10.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.services.monitor.StorageFactoryService.createServiceRoot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.createPersistentService(Unknown Source)
at org.apache.derby.impl.services.monitor.FileMonitor.createPersistentService(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.createPersistentService(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection$5.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.jdbc.EmbedConnection.createPersistentService(Unknown Source)
... 149 more
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
^
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
In my case the problem was in permissions. I was trying to launch spark as not admin user.
Once I've started as root, all went well. However I'm not recommending such behavior. Just pointing to check permissions and set it up accordingly. Running as root is a bad practice.

cannot start the spark on datastax

I am trying to start the spark on datastax (4.6).
when I executed ./dse spark, I got :
java.lang.SecurityException: Authentication failed! Credentials required
at com.sun.jmx.remote.security.JMXPluggableAuthenticator.authenticationFailure(JMXPluggableAuthenticator.java:211)
at com.sun.jmx.remote.security.JMXPluggableAuthenticator.authenticate(JMXPluggableAuthenticator.java:163)
at sun.management.jmxremote.ConnectorBootstrap$AccessFileCheckerAuthenticator.authenticate(ConnectorBootstrap.java:219)
at javax.management.remote.rmi.RMIServerImpl.doNewClient(RMIServerImpl.java:232)
at javax.management.remote.rmi.RMIServerImpl.newClient(RMIServerImpl.java:199)
at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(Unknown Source)
at sun.rmi.transport.StreamRemoteCall.executeCall(Unknown Source)
at sun.rmi.server.UnicastRef.invoke(Unknown Source)
at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source)
at javax.management.remote.rmi.RMIConnector.getConnection(Unknown Source)
at javax.management.remote.rmi.RMIConnector.connect(Unknown Source)
at javax.management.remote.JMXConnectorFactory.connect(Unknown Source)
at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:151)
at org.apache.cassandra.tools.NodeProbe.<init>(NodeProbe.java:107)
at com.datastax.bdp.tools.DseTool.<init>(DseTool.java:92)
at com.datastax.bdp.tools.DseTool.main(DseTool.java:1048)
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_21)
Type in expressions to have them evaluated.
Type :help for more information.
Creating SparkContext...
Initializing SparkContext with MASTER: spark://master:7077
Created spark context..
Spark context available as sc.
15/05/18 09:27:03 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#master:7077]. Address is now gated for 60000 ms, all messages to this address will be delivered to dead letters.
15/05/18 09:27:03 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#master:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#master:7077
HiveSQLContext available as hc.
CassandraSQLContext available as csc.
Type in expressions to have them evaluated.
Type :help for more information.
scala> 15/05/18 09:28:03 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/05/18 09:28:03 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
The strange thing is
1, I have disenabled the password authorizer, but still got
java.lang.SecurityException: Authentication failed! Credentials required
2, I have set the SPARK_MASTER=my_hostname, which is the same host with the cassandra node , but still got All masters are unresponsive
Could anyone please to help me with this? Thanks
Make sure you properly set the replication factor for your dse_auth keyspace:
http://docs.datastax.com/en/datastax_enterprise/4.0/datastax_enterprise/sec/secConfSysAuthKeyspRepl.html
Procedure
Set the replication factor based on one of the following examples depending on your environment:
SimpleStrategy example:
ALTER KEYSPACE "system_auth"
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
NetworkTopologyStrategy example:
ALTER KEYSPACE "system_auth"
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
For starting spark on dse with authentication, you could also pass in credentials in this manner
dse spark -Dcassandra.username=user -Dcassandra.password=pass

Why am I unable to connect to kafka using pyspark? Kafka_2.12-2.3.0 and Spark_2.4.4 or 2.3.0 or 2.3.4

I can not connect to kafka_2.12-2.3.0 from spark_2.4.4 structured streaming using the below code in python. My scala version is 2.11.12 and OpenJDK is 1.8.0_222
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("kafka-spark-structured-stream")\
.getOrCreate()
dsraw = spark\
.readStream\
.format("kafka")\
.option("kafka.bootstrap.servers", "**kafka-broker-ID**:9092")\
.option("subscribe", "test")\
.option("startingOffsets", "earliest")\
.load()
The following are the spark-submits i tried multiple times by varying versions like changing from 2.11 to 2.12 but still failed:
$spark-submit --jars /opt/hadoop/spark/jars/spark-sql-kafka-0-10_2.11-2.4.4.jar,/opt/hadoop/spark/jars/kafka-clients-0.10.1.0.jar --master yarn --deploy-mode client /opt/hadoop/spark/spark-application/main/kafka-spark-structured-stream.py
$spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.4 --master yarn --deploy-mode client /opt/hadoop/spark/spark-application/main/kafka-spark-structured-stream.py
I keep getting the error below no matter how i try differently with spark-submit:
2019-10-23 15:40:37,096 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#cf7aac8{/SQL/execution,null,AVAILABLE,#Spark}
2019-10-23 15:40:37,096 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution/json.
2019-10-23 15:40:37,097 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#5c593907{/SQL/execution/json,null,AVAILABLE,#Spark}
2019-10-23 15:40:37,118 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql.
2019-10-23 15:40:37,120 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler#38634422{/static/sql,null,AVAILABLE,#Spark}
2019-10-23 15:40:40,573 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
check_1======check_1======check_1======check_1======check_1======check_1======check_1======check_1======check_1======check_1======
Traceback (most recent call last):
File "/opt/hadoop/spark/spark-application/main/test.py", line 15, in <module>
.option("startingOffsets", "earliest").load()
File "/opt/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172, in load
File "/opt/hadoop/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/hadoop/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/opt/hadoop/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.load.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.kafka010.KafkaSourceProvider could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
at org.apache.spark.sql.kafka010.KafkaSourceProvider.<init>(KafkaSourceProvider.scala:44)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 24 more
doing the spark-submit --version also gives me below versions:
(base) [hadoop#master ~]$ spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.4
/_/
Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_222
Branch
Compiled by user on 2019-08-27T21:21:38Z
Revision
Url
Type --help for more information.
I finally solved it by downgrading to specific version of Spark which 2.4.0. Here are the versions I used:
spark=2.4.0
kafka=2.12-2.3.0
scala=2.11.12
openJDK=1.8.0_222
Here is the spark-submit :
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0,org.apache.kafka:kafka-clients:2.3.0 --master yarn --deploy-mode client /opt/hadoop/spark/spark-application/main/kafka-spark-structured-stream.py
This probably due to your application dependencies; I think there is an incompatibility between the Kafka-client and the Spark version you're using...
I got the same error using Scala and I solved it by downgrading to Spark 2.3 instead of 2.4.

Resources