connecting to cassandra using datastax' connector - apache-spark

When I'm trying to connect to the cassandra seed node using the datastax connector, I can't.
I have four spark nodes: one master and three workers. This works well on its own. The same machines have cassandra installed on them with the one being the spark master as a seed node. This works on its own as well (I successfully wrote and read from it).
Now, I'm trying to do
val info = spark_context.cassandraTable("files", "metainfo")
println( info.count )
Before, I specify the spark context as follows:
val confStandalone = new SparkConf()
.set("spark.cassandra.connection.host", "10.14.56.156")
.setMaster("spark://10.14.56.156:7077")
.setAppName("Test")
.set("spark.executor.memory", "1g")
.set("spark.eventLog.enabled", "true")
.set("spark.driver.host", "10.14.56.156")
.set("spark.broadcast.factory", "org.apache.spark.broadcast.HttpBroadcastFactory")
val spark_context = new SparkContext( confStandalone )
spark_context.addJar("SOME_PATH/spark-cassandra-connector_2.10-1.2.0-alpha1.jar")
In the cassandra.yaml file I set the rpc_address to 10.14.56.156 and used the standard ports (9160, 9042). Now when I do
sbt run
I get the following error:
15/03/18 16:38:43 INFO LocalNodeFirstLoadBalancingPolicy: Adding host 127.0.0.1 (datacenter1)
15/03/18 16:38:43 INFO LocalNodeFirstLoadBalancingPolicy: Adding host 10.14.56.156 (datacenter1)
15/03/18 16:38:43 INFO LocalNodeFirstLoadBalancingPolicy: Adding host 127.0.0.1 (datacenter1)
15/03/18 16:38:43 ERROR Session: Error creating pool to /127.0.0.1:9042 com.datastax.driver.core.TransportException: [/127.0.0.1:9042] Cannot connect
at com.datastax.driver.core.Connection.<init>(Connection.java:106)
at com.datastax.driver.core.PooledConnection.<init>(PooledConnection.java:35)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:528)
...
Caused by: java.net.ConnectException: Connection refused: /127.0.0.1:9042 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
...
Now, when I change the rpc_address to 0.0.0.0 as id sometimes advised, I get the same error but with 10.14.56.156 instead of 127.0.0.1 and only the line:
15/03/18 16:38:43 INFO LocalNodeFirstLoadBalancingPolicy: Adding host 10.14.56.156 (datacenter1)
with the one above and the one below (referring to 127.0.0.1) removed.
I didn't set any firewall rules in the iptables, so I don't think that would be an issue. Help appreciated!

Have you looked at what broadcast_rpc_address is set to? The java-driver will derive the ip to connect to from the 'peer' column of system.peers. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must be set.
My guess is that with your rpc_address set to 0.0.0.0, the driver is connecting from the broadcast_rpc_address even though it says [/10.14.56.156:9042] Cannot connect (you may see Connection refused: /127.0.0.1:9042 further in the stack trace).

Related

datastax cassandra java NoHostAvailableException

Had a fresh installation of Apache Cassandra version 3.7
Installation was successful and i could access cqlsh with out any issues.
[dpmuser#LOGDPM01 bin]$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]
But when i try to connect via DataStax Cassandra client , I am hitting NoHostAvailableException .
Here is the sample code i tried
import com.datastax.driver.core.*;
public class DBHitTEst {
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
Session session = null;
Cluster cluster = Cluster.builder()
.addContactPoints("127.0.0.1").withPort(9042)
.build();
System.out.println("Connecting to cluster ...");
session = cluster.connect();
System.out.print("Connected !"+session);
}}
Exception in console shown below.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Connecting to cluster ...
Aug 10, 2017 7:58:23 AM io.netty.channel.ChannelInitializer channelRegistered
WARNING: Failed to initialize a channel. Closing: [id: 0x7d2724c5]
java.lang.NoSuchMethodError: io.netty.util.AttributeKey.valueOf(Ljava/lang/String;)Lio/netty/util/AttributeKey;
at com.datastax.driver.core.Message.(Message.java:39)
at com.datastax.driver.core.Connection$Initializer.initChannel(Connection.java:1425)
.
.
.
Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException: [/127.0.0.1:9042] Cannot connect))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:232)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.negotiateProtocolVersionAndConnect(Cluster.java:1600)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1518)
at com.datastax.driver.core.Cluster.init(Cluster.java:159)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:330)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:305)
at com.datastax.driver.core.Cluster.connect(Cluster.java:247)
at com.cassandra.db.executables.DBHitTEst.main(DBHitTEst.java:14)
Libraries References
cassandra-driver-core-3.3.0.jar
guava-23.0.jar
log4j-api-2.8.2.jar
metrics-core-3.0.2.jar
netty-all-4.0.9.Final.jar
slf4j-api-1.7.25.jar
Important configs is cassandra.yaml
start_rpc: false
start_native_transport: true
native_transport_port: 9042
rpc_address: localhost
rpc_port: 9160
listen_address: localhost
seeds: "127.0.0.1"
Other system particulars
[dpmuser#LOGDPM01 conf]$ hostname
LOGDPM01
[dpmuser#LOGDPM01 conf]$ more /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
[dpmuser#LOGDPM01 conf]$ ping LOGDPM01
PING LOGDPM01 (10.0.2.15) 56(84) bytes of data.
64 bytes from LOGDPM01 (10.0.2.15): icmp_seq=1 ttl=64 time=0.055 ms
Running centos VM.Kindly let me know where am i wrong ?
Library mismatch was throwing the warning I observed in console .
WARNING: Failed to initialize a channel. Closing: [id: 0x7d2724c5] java.lang.NoSuchMethodError: io.netty.util.AttributeKey.valueOf(Ljava/lang/String;)Lio/netty/util/AttributeKey; at com.datastax.driver.core.Message.(Message.java:39) at com.datastax.driver.core.Connection$Initializer.initChannel(Connection.java:1425) . . .
On choosing the correct libraries from the lib folder of Cassandra installation directly sorted the issue .

spark submit "Service 'Driver' could not bind on port" error

I used the following command to run the spark java example of wordcount:-
time spark-submit --deploy-mode cluster --master spark://192.168.0.7:6066 --class org.apache.spark.examples.JavaWordCount /home/pi/Desktop/example/new/target/javaword.jar /books_50.txt
When I run it, the following is the output:-
Running Spark using the REST application submission protocol.
16/07/18 03:55:41 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://192.168.0.7:6066.
16/07/18 03:55:44 INFO rest.RestSubmissionClient: Submission successfully created as driver-20160718035543-0000. Polling submission state...
16/07/18 03:55:44 INFO rest.RestSubmissionClient: Submitting a request for the status of submission driver-20160718035543-0000 in spark://192.168.0.7:6066.
16/07/18 03:55:44 INFO rest.RestSubmissionClient: State of driver driver-20160718035543-0000 is now RUNNING.
16/07/18 03:55:44 INFO rest.RestSubmissionClient: Driver is running on worker worker-20160718041005-192.168.0.12-42405 at 192.168.0.12:42405.
16/07/18 03:55:44 INFO rest.RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20160718035543-0000",
"serverSparkVersion" : "1.6.2",
"submissionId" : "driver-20160718035543-0000",
"success" : true
}
I checked the particular worker (192.168.0.12) for its log and it says:-
Launch Command: "/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/jre/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/lib/spark-assembly-1.6.2-hadoop2.6.0.jar:/opt/spark/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark/lib/datanucleus-core-3.2.10.jar:/opt/spark/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.supervise=false" "-Dspark.app.name=org.apache.spark.examples.JavaWordCount" "-Dspark.submit.deployMode=cluster" "-Dspark.jars=file:/home/pi/Desktop/example/new/target/javaword.jar" "-Dspark.master=spark://192.168.0.7:7077" "-Dspark.executor.memory=10M" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker#192.168.0.12:42405" "/opt/spark/work/driver-20160718035543-0000/javaword.jar" "org.apache.spark.examples.JavaWordCount" "/books_50.txt"
========================================
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/18 04:10:58 INFO SecurityManager: Changing view acls to: pi
16/07/18 04:10:58 INFO SecurityManager: Changing modify acls to: pi
16/07/18 04:10:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pi); users with modify permissions: Set(pi)
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
16/07/18 04:11:00 WARN Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'Driver' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'Driver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
My spark-env.sh file (for master) contains:-
export SPARK_MASTER_WEBUI_PORT="8080"
export SPARK_MASTER_IP="192.168.0.7"
export SPARK_EXECUTOR_MEMORY="10M"
My spark-env.sh file (for worker) contains:-
export SPARK_WORKER_WEBUI_PORT="8080"
export SPARK_MASTER_IP="192.168.0.7"
export SPARK_EXECUTOR_MEMORY="10M"
Please help...!!
I had the same issue when trying to run the shell, and was able to get this working by setting the SPARK_LOCAL_IP environment variable. You can assign this from the command line when running the shell:
SPARK_LOCAL_IP=127.0.0.1 ./bin/spark-shell
For a more permanent solution, create a spark-env.sh file in the conf directory of your Spark root. Add the following line:
SPARK_LOCAL_IP=127.0.0.1
Give execute permissions to the script using chmod +x ./conf/spark-env.sh, and this will set this environment variable by default.
I am using Maven/SBT to manage dependencies and the Spark core is contained in a jar file.
You can override the SPARK_LOCAL_IP at runtime by setting the "spark.driver.bindAddress" (here in Scala):
val config = new SparkConf()
config.setMaster("local[*]")
config.setAppName("Test App")
config.set("spark.driver.bindAddress", "127.0.0.1")
val sc = new SparkContext(config)
I also had this issue.
The reason (for me) was that the IP of my local system was not reachable from my local system.
I know that statement makes no sense, but please read the following.
My system name (uname -s) shows that my system is named "sparkmaster".
In my /etc/hosts file, I have assigned a fixed IP address for the sparkmaster system as "192.168.1.70". There were additional fixed IP addresses for sparknode01 and sparknode02 at ...1.71 & ...1.72 respectively.
Due to some other problems I had, I needed to change all of my network adapters to DHCP. This meant that they were getting addresses like 192.168.90.123.
The DHCP addresses were not in the same network as the ...1.70 range and there was no route configured.
When spark starts, is seems to want to try to connect to the host named in uname (i.e. sparkmaster in my case). This was the IP 192.168.1.70 - but there was no way to connect to that because that address was in an unreachable network.
My solution was to change one of my Ethernet adapters back to a fixed static address (i.e. 192.168.1.70) and voila - problem solved.
So the issues seems to be that when spark starts in "local mode" it attempts to connect to a system named after your system's name (rather than local host).
I guess this makes sense if you are wanting to setup a cluster (Like I did) but it can result in the above confusing message.
Possibly putting your system's host name on the 127.0.0.1 entry in /etc/hosts may also solve this problem, but I did not try it.
You need to enter the hostname in your /etc/hosts file.
Something like:
127.0.0.1 localhost "hostname"
This is possibly a duplicate of Spark 1.2.1 standalone cluster mode spark-submit is not working
I have tried the same steps, but able to run the job. Kindly post the full spark-env.sh and spark-defaults if possible.
I had this problem and it is because of changing real IP with my IP in /etc/hosts.
This issue is related to IP address alone. Error messages in the log file are not informative.
check with following 3 steps:
check your IP address - can be checked with ifconfig or ip commands. If your service is not a Public service. IP addresses with 192.168 should be good enough. 127.0.0.1 cannot be used if you are planning a cluster.
check your environment variable SPARK_MASTER_HOST - check there are no typos in the name of the variable or actual IP address.
env | grep SPARK_
check the port you are planning to use for sparkMaster is free with command netstat. Do not use a port below 1024. For example:
netstat -a | 9123
After your sparkmaster starts running if you are not able see webui from a different machine, then open the webui port with command iptables.
Use as below in dataframes
val spark=SparkSession.builder.appName("BinarizerExample").master("local[*]").config("spark.driver.bindAddress", "127.0.0.1").getOrCreate()
First Option :-
Following steps might help:
Get your hostname by using "hostname" command.
xxxxxx.ssssss (e) base ~ hostname
xxxxxx.ssssss.net
Make an entry in the /etc/hosts file for your hostname if not present as follows:
127.0.0.1 xxxxxx.ssssss.net
Second Option:-
you can set spark.driver.bindAddress value in your spark.conf file
spark.driver.bindAddress=127.0.0.1
Thanks!!
I solved this problem by modifying the slave file.its spark-2.4.0-bin-hadoop2.7/conf/slave
please check your configure。

Can we connect to Spark cluster from remote host via java program?

I am trying to connect to spark cluster from remote system. My java code is shown below.
JAVA Code
new SparkConf()
.setAppName("Java API demo")
.setMaster("spark://192.168.XX.XX:7077")
.set("spark.driver.host","192.168.XX.XX")
.set("spark.driver.port","9929");
It gives me this error:
Error Message
16/04/14 16:08:42 ERROR NettyTransport: failed to bind to /192.168.XX.XX:9929, shutting down Netty transport
16/04/14 16:08:42 WARN Utils: Service 'sparkDriver' could not bind on port 9929. Attempting port 9930.
16/04/14 16:08:42 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
First of all, is this possible in spark? I am using Spark 1.4 version. Thanks in advcance..

Spark BlockManager running on localhost

I have a simple script file I am trying to execute in the spark-shell that mimics the tutorial here
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
sc.stop();
val conf = new SparkConf().setAppName("MyApp").setMaster("mesos://zk://172.24.51.171:2181/mesos").set("spark.executor.uri", "hdfs://172.24.51.171:8020/spark-1.3.0-bin-hadoop2.4.tgz").set("spark.driver.host", "172.24.51.142")
val sc2 = new SparkContext(conf)
val file = sc2.textFile("hdfs://172.24.51.171:8020/input/pg4300.txt")
val errors = file.filter(line => line.contains("ERROR"))
errors.count()
My namenode and mesos master are on 172.24.51.171, my ip address is 172.24.51.142. I have these line saved to a file, which I then launch using the command:
/opt/spark-1.3.0-bin-hadoop2.4/bin/spark-shell -i WordCount.scala
My remote executors are all dying with errors similar to the following:
15/04/08 14:30:39 ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to localhost/127.0.0.1:48554
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:87)
at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:89)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:594)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:592)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:592)
at org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:586)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.org$apache$spark$broadcast$TorrentBroadcast$$anonfun$$getRemote$1(TorrentBroadcast.scala:126)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply(TorrentBroadcast.scala:136)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$1.apply(TorrentBroadcast.scala:136)
at scala.Option.orElse(Option.scala:257)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1152)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:48554
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
... 1 more
This failure happens after I run the errors.count() command. Earlier in my shell, after I create the new SparkContext I see the lines:
15/04/08 14:31:18 INFO NettyBlockTransferService: Server created on 48554
15/04/08 14:31:18 INFO BlockManagerMaster: Trying to register BlockManager
15/04/08 14:31:18 INFO BlockManagerMasterActor: Registering block manager localhost:48554 with 265.4 MB RAM, BlockManagerId(<driver>, localhost, 48554)
15/04/08 14:31:18 INFO BlockManagerMaster: Registered BlockManager
I guess whats happening is Spark is recording the address of the BlockManager as localhost:48554, which is then getting sent to all the executors who try to talk to their localhosts:48554, instead of the driver's ip address at port 48554. Why is spark using localhost as the address of the BlockManager and not spark.driver.host?
Additional Information
In the Spark Config there is a spark.blockManager.port but no spark.blockManager.host? There is only a spark.driver.host, which you can see I set in my SparkConf.
Possibly related to this JIRA Ticket although that seemed like a network issue. My network is configured with DNS just fine.
Can you try by providing Spark Master address using --master parameter when invoking the spark-shell (or add in spark-defaults.conf). I had a similar issue (see my post Spark Shell Listens on localhost instead of configured IP address) and it looks like BlockManager listens on localhost when the context is dynamically created in the shell.
Logs:
When original context is used (listens on hostname)
BlockManagerInfo: Added broadcast_1_piece0 in memory on ubuntu64server2:33301
When new context is created (listens on localhost)
BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:40235
I had to connect to a Cassandra cluster and was able to query it by providing spark.cassandra.connection.host in spark-defaults.conf and importing packages com.datastax.spark.connector._ in the spark shell.
Try setting SPARK_LOCAL_IP (on the command line) or spark.local.ip through the sparkConf object.

Spark 1.2.1 standalone cluster mode spark-submit is not working

I have 3 node spark cluster
node1 , node2 and node 3
I running below command on node 1 for deploying driver
/usr/local/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class com.fst.firststep.aggregator.FirstStepMessageProcessor --master spark://ec2-xx-xx-xx-xx.compute-1.amazonaws.com:7077 --deploy-mode cluster --supervise file:///home/xyz/sparkstreaming-0.0.1-SNAPSHOT.jar /home/xyz/config.properties
driver gets launched on node 2 in cluster. but getting exception on node 2 that it is trying to bind to node 1 ip.
2015-02-26 08:47:32 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:32 INFO Slf4jLogger:80 - Slf4jLogger started
2015-02-26 08:47:33 ERROR NettyTransport:65 - failed to bind to ec2-xx.xx.xx.xx.compute-1.amazonaws.com/xx.xx.xx.xx:0, shutting down Netty transport
2015-02-26 08:47:33 WARN Utils:71 - Service 'Driver' could not bind on port 0. Attempting port 1.
2015-02-26 08:47:33 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:33 ERROR Remoting:65 - Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:136)
at akka.remote.Remoting.start(Remoting.scala:201)
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:33)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: ec2-xx-xx-xx.compute-1.amazonaws.com/xx.xx.xx.xx:0
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
kindly suggest
Thanks
after spending lot more time.i got the answer.i did below changes
remove entry of SPARK_LOCAL_IP and SPARK_MASTER_IP
add host name and private ip address of each other nodes in etc/hosts.
use --deploy-mode cluster --supervise
thats all and it works perfectly with fully HA components(Master,Slaves and Driver)
Thanks
Cluster mode is not supported in EC2 1.2 instances where it creates a standalone cluster. Hence you can try removing
--deploy-mode cluster --supervise

Resources