DSE Spark Shell Authentication - cassandra

I have a DSE 4.5 installation with spark running. I need some help in passing the username / password of cassandra cluster from Spark Shell.
I have added these properties to conf/spark-default.conf file
spark.cassandra.auth.username=user
spark.cassandra..auth.password=pass
And start up my spark shell using
dse spark
But still seeing the error when I try sc.cassandraTable
com.datastax.driver.core.exceptions.AuthenticationException: Authentication error on host /11.111.11.11:9042: Host /11.111.11.11:9042 requires authentication, but no authenticator found in Cluster configuration
at com.datastax.driver.core.AuthProvider$1.newAuthenticator(AuthProvider.java:38)
at com.datastax.driver.core.Connection.initializeTransport(Connection.java:138)
at com.datastax.driver.core.Connection.<init>(Connection.java:111)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:432)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:216)
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:171)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1104)

looks like you can execute this command
dse spark -Dcassandra.username=user -Dcassandra.password=pass
ref:
http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/sec/secIntrnlAuth.html?scroll=secItrnlAuth__authentication-for-hadoop-tools

This worked for me:
dse -u cassandra -p cassandra spark

Related

Error when running spark application with zeppelin

When I run the above spark application with zeppelin in Yarn cluster with cluster mode, I get the following error:
Where may be the problem? Thanks

Unable to write data on hive using spark

I am using spark1.6. I am creating hivecontext using spark context. When I save the data into hive it gives error. I am using cloudera vm. My hive is inside cloudera vm and spark in on my system. I can access the vm using IP. I have started the thrift server and hiveserver2 on vm. I have user thrift server uri for hive.metastore.uris
val hiveContext = new HiveContext(sc)
hiveContext.setConf("hive.metastore.uris", "thrift://IP:9083")
............
............
df.write.mode(SaveMode.Append).insertInto("test")
I get the following error:
FAILED: SemanticException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClien‌​t
Probably inside spark conf folder, hive-site.xml is not available , I have added the details below.
Adding hive-site.xml inside spark configuration folder.
creating a symlink which points to hive-site.xml in hive configuration folder.
sudo ln -s /usr/lib/hive/conf/hive-site.xml /usr/lib/spark/conf/hive-site.xml
after the above steps, restarting spark-shell should help.

Datastax Spark worker is always looking for master at 127.0.0.1

I am trying to bring up datastax cassandra in analytics mode by using "dse cassandra -k -s". I am using DSE 5.0 sandbox on a single node setup.
I have configured the spark-env.sh with SPARK_MASTER_IP as well as SPARK_LOCAL_IP to point to my LAN IP.
export SPARK_LOCAL_IP="172.40.9.79"
export SPARK_MASTER_HOST="172.40.9.79"
export SPARK_WORKER_HOST="172.40.9.79"
export SPARK_MASTER_IP="172.40.9.79"
All above variables are setup in spark-env.sh.
Despite these, the worker will not come up. It is always looking for a master at 127.0.0.1.This is the error i am seeing in /var/log/cassandra/system.log
WARN [worker-register-master-threadpool-8] 2016-10-04 08:02:45,832 SPARK-WORKER Logging.scala:91 - Failed to connect to master 127.0.0.1:7077
java.io.IOException: Failed to connect to /127.0.0.1:7077
Result from dse client-tool shows 127.0.0.1
$ dse client-tool -u cassandra -p cassandra spark master-address
spark://127.0.0.1:7077
However i am able to access the spark web UI from the LAN IP 172.40.9.79
Spark Web UI screenshot
Any help is greatly appreciated
Try add in file spark-defaults.conf this parameter: spark.master local[*] or spark.master 172.40.9.79. Maybe this solves your problem

new Spark StreamingContext failes with hdfs errors

I'm using dcos installed via Azure ACS and installed hdfs and spark via dcos tool with default options.
Creating a SparkStreamingContext gives:
16/07/22 01:51:04 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly.
16/07/22 01:51:04 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly.
Exception in thread "main" java.lang.IllegalArgumentException:
java.net.UnknownHostException: namenode1.hdfs.mesos
I expect I have to redeploy the spark package with dcos package install with –options= but can't figure out what the hdfs.config-url should be. The https://docs.mesosphere.com/1.7/usage/service-guides/spark/install/#hdfs docs seem out of date.
Yes, it is out of date. We'll fix that.
DC/OS HDFS now serves its config on http://hdfs.marathon.mesos:[port]/v1/connect

SparkDeploySchedulerBackend Error: Application has been killed. All masters are unresponsive

While I'm starting Spark shell:
bin>./spark-shell
I get the following error :
Spark assembly has been built with Hive, including Data nucleus jars on classpath
Welcome to SPARK VERSION 1.3.0
Using Scala version 2.10.4 (Java HotSpot(TM) Server VM, Java 1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.
15/05/10 12:12:21 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/05/10 12:12:21 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
I have installed spark by follow below link :- http://www.philchen.com/2015/02/16/how-to-install-apache-spark-and-cassandra-stack-on-ubuntu
You should supply your Spark Cluster's Master URL when start a spark-shell
At least:
bin/spark-shell --master spark://master-ip:7077
All the options make up a long list and you can find the suitable ones yourself:
bin/spark-shell --help
I am assuming that you are running this in standalone/local mode.
Run your spark shell with following line. That indicates you are using all the available cores of your master which is local machine.
bin/spark-shell --master local[*]
http://spark.apache.org/docs/1.2.1/submitting-applications.html#master-urls
You also need to start spark master and slave before giving spark-submit command
start-master.sh
start-slave.sh spark://spark:7077
then use
spark-submit --master spark://spark:7077
Look at your log files for "permission denied" errors... It may happens that your client service doesn't have the proper authority to access your Master folders.

Resources