Datastax Spark worker is always looking for master at 127.0.0.1 - apache-spark

I am trying to bring up datastax cassandra in analytics mode by using "dse cassandra -k -s". I am using DSE 5.0 sandbox on a single node setup.
I have configured the spark-env.sh with SPARK_MASTER_IP as well as SPARK_LOCAL_IP to point to my LAN IP.
export SPARK_LOCAL_IP="172.40.9.79"
export SPARK_MASTER_HOST="172.40.9.79"
export SPARK_WORKER_HOST="172.40.9.79"
export SPARK_MASTER_IP="172.40.9.79"
All above variables are setup in spark-env.sh.
Despite these, the worker will not come up. It is always looking for a master at 127.0.0.1.This is the error i am seeing in /var/log/cassandra/system.log
WARN [worker-register-master-threadpool-8] 2016-10-04 08:02:45,832 SPARK-WORKER Logging.scala:91 - Failed to connect to master 127.0.0.1:7077
java.io.IOException: Failed to connect to /127.0.0.1:7077
Result from dse client-tool shows 127.0.0.1
$ dse client-tool -u cassandra -p cassandra spark master-address
spark://127.0.0.1:7077
However i am able to access the spark web UI from the LAN IP 172.40.9.79
Spark Web UI screenshot
Any help is greatly appreciated

Try add in file spark-defaults.conf this parameter: spark.master local[*] or spark.master 172.40.9.79. Maybe this solves your problem

Related

Start Spark master on the IP instead of Hostname

I'm trying to set up a remote Spark 2.4.5 cluster on Ubuntu 18. After I start ./sbin/stat-master.sh WebUI is available at <INSTANCE-IP>:8080 but it shows "Spark Master at spark://spark-master:7077" where spark-master is my hostname on the remote machine.
I'm able to start a worker with ./sbin/start-slave.sh spark://spark-master:7077 only, but <INSTANCE-IP>:4040 doesn't work. When I try ./sbin/start-slave.sh spark://<INSTANCE-IP>:7077 I can see the process but the worker is not visible in WebUI.
As a result, I can not connect to the cluster from my local machine with spark-shell --master spark://<INSTANCE-IP>:7077. The error is:
StandaloneAppClient$ClientEndpoint: Failed to connect to master <INSTANCE-IP>:7077

Can't access to SparkUI though YARN

I'm building a docker image to run zeppelin or spark-shell in local against a production Hadoop cluster with YARN. edit: the environment was macOS
I can execute jobs or a spark-shell well but when I try to access on Tracking URL on YARN meanwhile the job is running it hangs YARN-UI for exactly 10 minutes. YARN still working and if I connect via ssh I can execute yarn commands.
If I don't access SparkUI (directly or through YARN) nothing happens. Jobs are executed and YARN-UI is not hanged.
More info:
Local, on Docker: Spark 2.1.2, Hadoop 2.6.0-cdh5.4.3
Production: Spark 2.1.0, Hadoop 2.6.0-cdh5.4.3
If I execute it locally (--master local[*]) it works and I can connect to SparkUI though 4040.
Spark config:
spark.driver.bindAddress 172.17.0.2 #docker_eth0_ip
spark.driver.host 192.168.XXX.XXX #local_ip
spark.driver.port 5001
spark.ui.port 4040
spark.blockManager.port 5003
Yes, ApplicationMaster and nodes have visibility over my local SparkUI or driver (telnet test)
As I said I can execute jobs then docker expose ports and its binding is working. Some logs proving it:
INFO ApplicationMaster: Driver now available: 192.168.XXX.XXX:5001
INFO TransportClientFactory: Successfully created connection to /192.168.XXX.XXX:5001 after 65 ms (0 ms spent in bootstraps)
INFO ApplicationMaster$AMEndpoint: Add WebUI Filter. AddWebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_HOSTS -> jobtracker.hadoop, PROXY_URI_BASES -> http://jobtracker.hadoop:8088/proxy/application_000_000),/proxy/application_000_000)
Some ideas or where I can look to see what's happening?
The problem was related with how docker manage IP incoming requests when it's executed on MacOS.
When YARN, which's running inside docker container, receives a request doesn't see original IP it sees the internal proxy docker IP (in my case 172.17.0.1).
When a request is send to my local container SparkUI, automatically redirects the request to hadoop master (is how YARN works) because it see that the request is not coming from hadoop master and it only accepts requests from this source.
When master receives the forwarded request it tries to send it to spark driver (my local docker container) which forward again the request to hadoop master because it see that the IP source is not the master, is the proxy IP.
It takes all threads reserved for UI. Until threads are not released YARN UI is hanged
I "solved" changing docker yarn configuration
<property>
<name>yarn.web-proxy.address</name>
<value>172.17.0.1</value>
</property>
This allows sparkUI to handle any request made to docker container.

Can't Connect to Cassandras default cluster (Test Cluster) Using Opscenter

My Error is as below
OpsCenter was not able to add the cluster: OpsCenter was unable to resolve the ip for Test_Cluster, please specify seed nodes using the rpc_address
My OS is CentOS 7
I install DSE 6
i found that datastax forbidden my ip

YCSB is not working for HBase

I am using hadoop-2.7.1,hbase-1.0.1.1, and zookeeper-3.4.6 on my linux server to compare HBase performance. My Hadoop, HBase, ZooKeeper are working fine with the below process:
19639 DataNode
19893 SecondaryNameNode
20116 ResourceManager
20530 QuorumPeerMain
20287 NodeManager
23767 Client
20838 HMaster
21015 HRegionServer
24620 Jps
19446 NameNode
In addition, YCSB also working fine. I have checked with BasicDb command './bin/ycsb load basic -P workloads/workloada'. However, while I am trying to run for the HBase with the simplest command './bin/ycsb load hbase -P workloads/workloada -p columnfamily=family'. It is not responding at all. I don't know why I'm having this problem. Could you please help me out problem this problem? Thanks in advance...
It has been solved the problem. conf/hbase-site.xml had problem, it wasn't getting right Zookeeper clientport. Default 2181 is much better to use.

DSE Spark Shell Authentication

I have a DSE 4.5 installation with spark running. I need some help in passing the username / password of cassandra cluster from Spark Shell.
I have added these properties to conf/spark-default.conf file
spark.cassandra.auth.username=user
spark.cassandra..auth.password=pass
And start up my spark shell using
dse spark
But still seeing the error when I try sc.cassandraTable
com.datastax.driver.core.exceptions.AuthenticationException: Authentication error on host /11.111.11.11:9042: Host /11.111.11.11:9042 requires authentication, but no authenticator found in Cluster configuration
at com.datastax.driver.core.AuthProvider$1.newAuthenticator(AuthProvider.java:38)
at com.datastax.driver.core.Connection.initializeTransport(Connection.java:138)
at com.datastax.driver.core.Connection.<init>(Connection.java:111)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:432)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:216)
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:171)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1104)
looks like you can execute this command
dse spark -Dcassandra.username=user -Dcassandra.password=pass
ref:
http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/sec/secIntrnlAuth.html?scroll=secItrnlAuth__authentication-for-hadoop-tools
This worked for me:
dse -u cassandra -p cassandra spark

Resources