cannot start the spark on datastax

cannot start the spark on datastax - cassandra

I am trying to start the spark on datastax (4.6).
when I executed ./dse spark, I got :
java.lang.SecurityException: Authentication failed! Credentials required
at com.sun.jmx.remote.security.JMXPluggableAuthenticator.authenticationFailure(JMXPluggableAuthenticator.java:211)
at com.sun.jmx.remote.security.JMXPluggableAuthenticator.authenticate(JMXPluggableAuthenticator.java:163)
at sun.management.jmxremote.ConnectorBootstrap$AccessFileCheckerAuthenticator.authenticate(ConnectorBootstrap.java:219)
at javax.management.remote.rmi.RMIServerImpl.doNewClient(RMIServerImpl.java:232)
at javax.management.remote.rmi.RMIServerImpl.newClient(RMIServerImpl.java:199)
at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(Unknown Source)
at sun.rmi.transport.StreamRemoteCall.executeCall(Unknown Source)
at sun.rmi.server.UnicastRef.invoke(Unknown Source)
at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source)
at javax.management.remote.rmi.RMIConnector.getConnection(Unknown Source)
at javax.management.remote.rmi.RMIConnector.connect(Unknown Source)
at javax.management.remote.JMXConnectorFactory.connect(Unknown Source)
at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:151)
at org.apache.cassandra.tools.NodeProbe.<init>(NodeProbe.java:107)
at com.datastax.bdp.tools.DseTool.<init>(DseTool.java:92)
at com.datastax.bdp.tools.DseTool.main(DseTool.java:1048)
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_21)
Type in expressions to have them evaluated.
Type :help for more information.
Creating SparkContext...
Initializing SparkContext with MASTER: spark://master:7077
Created spark context..
Spark context available as sc.
15/05/18 09:27:03 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#master:7077]. Address is now gated for 60000 ms, all messages to this address will be delivered to dead letters.
15/05/18 09:27:03 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#master:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#master:7077
HiveSQLContext available as hc.
CassandraSQLContext available as csc.
Type in expressions to have them evaluated.
Type :help for more information.
scala> 15/05/18 09:28:03 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/05/18 09:28:03 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
The strange thing is
1, I have disenabled the password authorizer, but still got
java.lang.SecurityException: Authentication failed! Credentials required
2, I have set the SPARK_MASTER=my_hostname, which is the same host with the cassandra node , but still got All masters are unresponsive
Could anyone please to help me with this? Thanks

Make sure you properly set the replication factor for your dse_auth keyspace:
http://docs.datastax.com/en/datastax_enterprise/4.0/datastax_enterprise/sec/secConfSysAuthKeyspRepl.html
Procedure
Set the replication factor based on one of the following examples depending on your environment:
SimpleStrategy example:
ALTER KEYSPACE "system_auth"
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
NetworkTopologyStrategy example:
ALTER KEYSPACE "system_auth"
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};

For starting spark on dse with authentication, you could also pass in credentials in this manner
dse spark -Dcassandra.username=user -Dcassandra.password=pass

Related

spark-shell exception org.apache.spark.SparkException: Exception thrown in awaitResult

Facing below error while starting spark-shell with yarn master. Shell is working with spark local master.
admin#XXXXXX:~$ spark-shell --master yarn 21/11/03 15:51:51 WARN Utils: Your hostname, XXXXXX resolves to a loopback address:
127.0.1.1; using 192.168.29.57 instead (on interface wifi0) 21/11/03 15:51:51 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/11/03 15:52:01 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. Spark context Web UI available at http://XX.XX.XX.XX:4040 Spark context available as 'sc' (master = yarn, app id = application_1635934709971_0001). Spark session available as 'spark'. Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.5
/_/
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_301) Type in expressions to have them evaluated. Type :help for more information.
scala>
scala> 21/11/03 15:52:35 ERROR YarnClientSchedulerBackend: YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details. 21/11/03 15:52:35 ERROR YarnClientSchedulerBackend: Diagnostics message: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:515)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:307)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:780)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:779)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:804)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:834)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) Caused by: java.io.IOException: Failed to connect to /192.168.29.57:33333
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /192.168.29.57:33333 Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:688)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
21/11/03 15:52:35 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
Below is spark-defaults.conf
spark.driver.memory 512m
spark.yarn.am.memory 512m
spark.executor.memory 512m
spark.eventLog.enabled true
spark.eventLog.dir file:////home/admin/spark_event_temp
spark.history.fs.logDirectory hdfs://localhost:9000/spark-logs
spark.history.fs.update.interval 10s
spark.history.ui.port 18080
spark.sql.warehouse.dir=file:////home/admin/spark_warehouse
spark.shuffle.service.port 7337
spark.ui.port 4040
spark.blockManager.port 31111
spark.driver.blockManager.port 32222
spark.driver.port 33333
spark version:- spark-2.4.5-bin-hadoop2.7
hadoop version:- hadoop-2.8.5
I can provide more information if needed. I have configured everything in the local machine.

Adding these properties in spark-env.sh fixed the issue for me.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/mnt/d/soft/hadoop-2.8.5
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=$SPARK_HOME
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop/
export SPARK_MASTER_HOST=127.0.0.1
export SPARK_LOCAL_IP=127.0.0.1

Issue with spark-shell command: java.io.IOException: Could not create FileClient

I have some issues when a executing spark-shell command:
[mapr#node1 ~]$ /opt/mapr/spark/spark-2.1.0/bin/spark-shell --master local[2]
Error:
20/05/02 14:21:34 ERROR SparkContext: Error initializing SparkContext.
java.io.IOException: Could not create FileClient
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:643)
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:696)
at com.mapr.fs.MapRFileSystem.getMapRFileStatus(MapRFileSystem.java:1405)
at com.mapr.fs.MapRFileSystem.getFileStatus(MapRFileSystem.java:1080)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:93)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:531)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
at $line3.$read$$iw$$iw.<init>(<console>:15)
at $line3.$read$$iw.<init>(<console>:42)
at $line3.$read.<init>(<console>:44)
at $line3.$read$.<init>(<console>:48)
at $line3.$read$.<clinit>(<console>)
at $line3.$eval$.$print$lzycompute(<console>:7)
at $line3.$eval$.$print(<console>:6)
at $line3.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)
at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:105)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
at org.apache.spark.repl.Main$.doMain(Main.scala:68)
at org.apache.spark.repl.Main$.main(Main.scala:51)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:733)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:177)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:202)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:116)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Could not create FileClient
at com.mapr.fs.MapRClientImpl.<init>(MapRClientImpl.java:136)
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:637)
... 58 more
java.io.IOException: Could not create FileClient
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:643)
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:696)
at com.mapr.fs.MapRFileSystem.getMapRFileStatus(MapRFileSystem.java:1405)
at com.mapr.fs.MapRFileSystem.getFileStatus(MapRFileSystem.java:1080)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:93)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:531)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
... 47 elided
Caused by: java.io.IOException: Could not create FileClient
at com.mapr.fs.MapRClientImpl.<init>(MapRClientImpl.java:136)
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:637)
... 58 more
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
^
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0-mapr-1710
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
Type in expressions to have them evaluated.
Type :help for more information.
scala> 2020-05-02 14:21:27,8616 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:2470 Thread: 30539 MoveToNextCldb: No CLDB entries, cannot run, sleeping 5 seconds!
2020-05-02 14:21:32,9268 ERROR Client fs/client/fileclient/cc/client.cc:1329 Thread: 30539 Failed to initialize client for cluster MyCluster, error Connection reset by peer(104)
I use Spark-2.1.0 and a cluster MapR with 3 nodes.
I also have the following conf files:
Slaves with a list of nodes:
# A Spark Worker will be started on each of the machines listed below.
node2
node3
Also add these following lines to $Spark_HOME/conf/spark-env.sh:
export SPARK_MASTER_HOST=node1
export SPARK_MASTER_IP=172.17.0.2
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
Please did any one had the same issue or know how to fix it.

Seems like this is typical installation issue. From Mapr forums I found, Please have a look at spark-on-yarn-errors- they proposed to install commands in the link as well as run configure.sh
Gist of this is (source mapr forums)...Here is the solution:
-pip uninstall toree
-pip install --pre toree
-jupyter toree install --interpreters=Scala,PySparK,SparK,SQL --spark_home=$SPARK_HOME
Also, Spark docs says for spark 2.1.0 compatible version is scala 2.11.8
Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API, Spark 2.1.0 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).
Hope you have not missed these versions.

Spark error : java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamWriteSupport

I am using Spark in Horton works, when i execute the below code i am getting exception. i also have a separate spark instance running in my system - same code is working fine in it.
I need to do anything different in Horton works to resolve the below error?.kindly assist me.
[root#sandbox-hdp bin]# ./spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/08/31 11:36:44 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context Web UI available at http://172.17.0.2:4041
Spark context available as 'sc' (master = local[*], app id = local-1535715404685).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0.2.6.3.0-235
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :paste
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load("hdfs://sandbox-hdp.hortonworks.com:8020/city.csv")
df.show()
df.printSchema()
// Exiting paste mode, now interpreting.
warning: there was one deprecation warning; re-run with -deprecation for details
java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamWriteSupport
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:533)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:89)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:89)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:304)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
... 53 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.StreamWriteSupport
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 86 more
All the CSV drivers are properly copied to jar folder inside spark. This is the same exception i am facing to connect to Excel, Hbase and apache phoenix.
I am facing this only in horton works.

After adding Latest Spark code Jar it worked

I had the same question.I found a mistake in my pom.xml that my spark version is diffrent with my environment. So, I changed the spark version and kept the same environment and solved this error.

unable to start ./spark-shell

i am new to spark an dtrying to install spark on my ubuntu machine and
i am facing below error when am i trying to start spark -shell
please help
Caused by: org.apache.derby.iapi.error.StandardException: Directory /home/amar/spark-2.2.0-bin-hadoop2.7/bin/metastore_db cannot be created.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.services.monitor.StorageFactoryService$10.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.services.monitor.StorageFactoryService.createServiceRoot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.createPersistentService(Unknown Source)
at org.apache.derby.impl.services.monitor.FileMonitor.createPersistentService(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.createPersistentService(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection$5.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.jdbc.EmbedConnection.createPersistentService(Unknown Source)
... 149 more
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
^
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

In my case the problem was in permissions. I was trying to launch spark as not admin user.
Once I've started as root, all went well. However I'm not recommending such behavior. Just pointing to check permissions and set it up accordingly. Running as root is a bad practice.

Why does starting spark-shell fail with "we couldn't find any external IP address!" on Windows?

I am having trouble in starting spark-shell on my Windows computer now. The version of Spark I am using is 1.5.2 pre-built for Hadoop 2.4 or later. I think spark-shell.cmd could be run directly without any configuration since it is pre-built and I cannot figure out what is the problem that prevents me from starting Spark correctly.
Aside from the error message printed out I can still execute some basic scala command on the command line, but apparently something is going wrong here.
Here is the error log from cmd:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.li
b.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more in
fo.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.propertie
s
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.5.2
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_25)
Type in expressions to have them evaluated.
Type :help for more information.
15/11/18 17:51:32 WARN MetricsSystem: Using default name DAGScheduler for source
because spark.app.id is not set.
Spark context available as sc.
15/11/18 17:51:39 WARN General: Plugin (Bundle) "org.datanucleus" is already reg
istered. Ensure you dont have multiple JAR versions of the same plugin in the cl
asspath. The URL "file:/C:/spark-1.5.2-bin-hadoop2.4/lib/datanucleus-core-3.2.10
.jar" is already registered, and you are trying to register an identical plugin
located at URL "file:/C:/spark-1.5.2-bin-hadoop2.4/bin/../lib/datanucleus-core-3
.2.10.jar."
15/11/18 17:51:39 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is
already registered. Ensure you dont have multiple JAR versions of the same plug
in in the classpath. The URL "file:/C:/spark-1.5.2-bin-hadoop2.4/lib/datanucleus
-rdbms-3.2.9.jar" is already registered, and you are trying to register an ident
ical plugin located at URL "file:/C:/spark-1.5.2-bin-hadoop2.4/bin/../lib/datanu
cleus-rdbms-3.2.9.jar."
15/11/18 17:51:39 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is alr
eady registered. Ensure you dont have multiple JAR versions of the same plugin i
n the classpath. The URL "file:/C:/spark-1.5.2-bin-hadoop2.4/bin/../lib/datanucl
eus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an
identical plugin located at URL "file:/C:/spark-1.5.2-bin-hadoop2.4/lib/datanucl
eus-api-jdo-3.2.6.jar."
15/11/18 17:51:39 WARN Connection: BoneCP specified but not present in CLASSPATH
(or one of dependencies)
15/11/18 17:51:40 WARN Connection: BoneCP specified but not present in CLASSPATH
(or one of dependencies)
15/11/18 17:51:46 WARN ObjectStore: Version information not found in metastore.
hive.metastore.schema.verification is not enabled so recording the schema versio
n 1.2.0
15/11/18 17:51:46 WARN ObjectStore: Failed to get database default, returning No
SuchObjectException
15/11/18 17:51:47 WARN : Your hostname, Lenovo-PC resolves to a loopback/non-reachab
le address: fe80:0:0:0:297a:e76d:828:59dc%wlan2, but we couldn't find any extern
al IP address!
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav
a:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.s
cala:171)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveCo
ntext.scala:162)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala
:160)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:167)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
orAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
onstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:10
28)
at $iwC$$iwC.<init>(<console>:9)
at $iwC.<init>(<console>:18)
at <init>(<console>:20)
at .<init>(<console>:24)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:
1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:
1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840
)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:8
57)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.sca
la:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply
(SparkILoopInit.scala:132)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply
(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoop
Init.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spark
ILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.s
cala:159)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkIL
oopInit.scala:108)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:
64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spark
ILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spark
ILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spark
ILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClass
Loader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$pr
ocess(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSub
mit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:18
0)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.
loadPermissionInfo(RawLocalFileSystem.java:559)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.
getPermission(RawLocalFileSystem.java:534)
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(Sess
ionState.java:599)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(Sess
ionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav
a:508)
... 56 more
<console>:10: error: not found: value sqlContext
import sqlContext.implicits._
^
<console>:10: error: not found: value sqlContext
import sqlContext.sql
^

There are a couple of issues. You're on Windows and things are different on this OS comparing to other POSIX-compliant OSes.
Start by reading Problems running Hadoop on Windows document and see if "missing WINUTILS.EXE" is the issue. Make sure you run spark-shell in console with admin rights.
You may also want to read the answers to a similar question Why does starting spark-shell fail with NullPointerException on Windows?
Also, you may have started spark-shell inside bin subdirectory and hence the errors like:
15/11/18 17:51:39 WARN General: Plugin (Bundle)
"org.datanucleus.api.jdo" is already registered. Ensure you dont have
multiple JAR versions of the same plugin in the classpath. The URL
"file:/C:/spark-1.5.2-bin-hadoop2.4/bin/../lib/datanucleus-api-jdo-3.2.6.jar"
is already registered, and you are trying to register an identical
plugin located at URL
"file:/C:/spark-1.5.2-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar."
And the last issue:
15/11/18 17:51:47 WARN : Your hostname, Lenovo-PC resolves to a
loopback/non-reachable address: fe80:0:0:0:297a:e76d:828:59dc%wlan2,
but we couldn't find any external IP address!
One workaround is to set SPARK_LOCAL_HOSTNAME to some resolvable host name and be done with it.
SPARK_LOCAL_HOSTNAME is the custom host name that overrides any other candidates for hostname when driver, master, workers, and executors are created.
In your case, using spark-shell, just execute the following:
SPARK_LOCAL_HOSTNAME=localhost ./bin/spark-shell
You can also use:
./bin/spark-shell -c spark.driver.host=localhost
Refer also to Environment Variables in the official documentation of Spark.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

cannot start the spark on datastax - cassandra

For starting spark on dse with authentication, you could also pass in credentials in this manner dse spark -Dcassandra.username=user -Dcassandra.password=pass

Related

spark-shell exception org.apache.spark.SparkException: Exception thrown in awaitResult

Issue with spark-shell command: java.io.IOException: Could not create FileClient

Spark error : java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamWriteSupport

unable to start ./spark-shell

Why does starting spark-shell fail with "we couldn't find any external IP address!" on Windows?

Categories

Resources