Spark High Availability - apache-spark

I`m using spark 1.2.1 on three nodes that run three workers with slave configuration and run daily jobs by using:
./spark-1.2.1/sbin/start-all.sh
//crontab configuration:
./spark-1.2.1/bin/spark-submit --master spark://11.11.11.11:7077 --driver-class-path home/ubuntu/spark-cassandra-connector-java-assembly-1.2.1-FAT.jar --class "$class" "$jar"
I want to keep spark master and slave workers available at all times, and even if it fail I need it to be restarted like a service (like cassandra does).
Is there any way to do it?
EDIT:
I looked into start-all.sh script and it is only contains the setup for start-master.sh script and start-slaves.sh script.
I tried to create a supervisor configuration file for it and only get the below errors:
11.11.11.11: ssh: connect to host 11.11.11.12 port 22: No route to host
11.11.11.13: org.apache.spark.deploy.worker.Worker running as process 14627. Stop it first.
11.11.11.11: ssh: connect to host 11.11.11.12 port 22: No route to host
11.11.11.12: ssh: connect to host 11.11.11.13 port 22: No route to host
11.11.11.11: org.apache.spark.deploy.worker.Worker running as process 14627. Stop it first.
11.11.11.12: ssh: connect to host 11.11.11.12 port 22: No route to host
11.11.11.13: ssh: connect to host 11.11.11.13 port 22: No route to host
11.11.11.11: org.apache.spark.deploy.worker.Worker running as process 14627. Stop it first.

There are tools like monit and supervisor (or even systemd) that can monitor and restart failed processes.

Related

Spark uses Random Port even after defining executor port

I have a small cluster setup for my development purpose, which contains 3 VMs with spark 2.3 installed on all the VMs. I have started the master in VM1 and slaves with master Ipaddress in other 2 Vms. we have Firewall up in all the 3 Vms and opened the port range from 38001:38113 in the firewall
Before starting the VMs we have the following configurations Done.
In Master, Worker 1 & Worker 2 Nodes
Spark-default.conf file was added with the following properties:
spark.blockManager.port 38001
spark.broadcast.port 38018
spark.driver.port 38035
spark.executor.port 38052
spark.fileserver.port 38069
spark.replClassServer.port 38086
spark.shuffle.service.port 38103
In Worker 1 & Worker 2 Nodes
Spark-env.sh file was added with the following properties:
SPARK_WORKER_PORT=38112 -- for worker-1
SPARK_WORKER_PORT=38113 -- for worker-2
When we started the Spark-shell and executed a sample csv file read, the executor started on the Worker is starting with a random port for spark driver.
E.g:
Spark Executor Command: "/usr/java/jdk1.8.0_171-amd64/jre/bin/java" "-cp" "/opt/spark/2.3.0/conf/:/opt/spark/2.3.0/jars/*" "-Xmx1024M" "-Dspark.driver.port=34573" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#293.72.254.89:34573" "--executor-id" "1" "--hostname" "293.72.146.384" "--cores" "4" "--app-id" "app-20180706072052-0000" "--worker-url" "spark://Worker#293.72.146.384:38112"
As you can see in the above command the executor started with Spark.driver.port with 34573. And this is always starting randomly. Because of this my program fails as it is unable to communicate with the driver.
Can anyone help me with this configuration which can be used to execute in network tight environment where All the ports are blocked.
Thanks in advance.
Start worker:
./start-slave.sh spark://hostname:port -p [Worker Port]
Options:
-c CORES, --cores CORES Number of cores to use
-m MEM, --memory MEM Amount of memory to use (e.g. 1000M, 2G)
-d DIR, --work-dir DIR Directory to run apps in (default: SPARK_HOME/work)
-i HOST, --ip IP Hostname to listen on (deprecated, please use --host or -h)
-h HOST, --host HOST Hostname to listen on
-p PORT, --port PORT Port to listen on (default: random)
--webui-port PORT Port for web UI (default: 8081)
--properties-file FILE Path to a custom Spark properties file.
Default is conf/spark-defaults.conf.

spark running at the port 249

sbin$ start-all.sh
I input this command then I got the below message.
I use port 249 not 22.
Port 22 is prohibited.
I'm connecting to the server using putty.
How can I fix this problem?
org.apache.spark.deploy.master.Master running as process 6211. Stop it first.
localhost: ssh: connect to host localhost port 22: Connection refused
I have the same problem and have solved it.
Follow these steps maybe can solve your problem :
vi conf/spark-env.sh
add these export SPARK_SSH_OPTS="-p 249"
rerun start-all.sh

How to configure cassandra for remote connection

I am trying to configure Cassandra Datastax Community Edition for remote connection on windows,
Cassandra Server is installed on a Windows 7 PC, With the local CQLSH it connects perfectly to the local server.
But when i try to connect with CQLSH from another PC in the same Network, i get this error message:
Connection error: ('Unable to connect to any servers', {'MYHOST':
error(10061, "Tried connecting to [('HOST_IP', 9042)]. Last error: No
connection could be made because the target machine actively refused
it")})
So i am wondering how to configure correctly (what changes should i make on cassandra.yaml config file) the Cassandra server to allow remote connections.
Thank you in advance!
How about this:
Make these changes in the cassandra.yaml config file:
start_rpc: true
rpc_address: 0.0.0.0
broadcast_rpc_address: [node-ip]
listen_address: [node-ip]
seed_provider:
- class_name: ...
- seeds: "[node-ip]"
reference: https://gist.github.com/andykuszyk/7644f334586e8ce29eaf8b93ec6418c4
Remote access to Cassandra is via its thrift port for Cassandra 2.0. In Cassandra 2.0.x, the default cqlsh listen port is 9160 which is defined in cassandra.yaml by the rpc_port parameter. By default, Cassandra 2.0.x and earlier enables Thrift by configuring start_rpc to true in the cassandra.yaml file.
In Cassandra 2.1, the cqlsh utility uses the native protocol. In Cassandra 2.1, which uses the Datastax python driver, the default cqlsh listen port is 9042.
The cassandra node should be bound to the IP address of your server's network card - it shouldn't be 127.0.0.1 or localhost which is the loopback interface's IP, binding to this will prevent direct remote access. To configure the bound address, use the rpc_address parameter in cassandra.yaml. Setting this to 0.0.0.0 will listen on all network interfaces.
Have you checked that the remote machine can connect to the Cassandra node? Is there a firewall between the machines? You can try these steps to test this out:
1) Ensure you can connect to that IP from the server you are on:
$ ssh user#xxx.xxx.xx.xx
2) Check the node's status and also confirm it shows the same IP:
$nodetool status
3) Run the command to connect with the IP (only specify the port if you are not using the default):
$ cqlsh xxx.xxx.xx.xx
Alternate solution to Kat. Worked with Ubuntu 16.04
ssh into server server_user#**.**.**.**
Stop cassandra if running:
Check if running with ps aux | grep cassandra
If running, will output a large block of commands / flags, e.g.
ubuntu 14018 4.6 70.1 2335692 712080 pts/2 Sl+ 04:15 0:11 java -Xloggc:./../logs/gc.log ........
Note 14018 in the example is the process id
Stop with kill <process_id> (in this case 14018)
edit cassandra.yaml file to be the following
rpc_address: 0.0.0.0
broadcast_rpc_address: **.**.**.** <- your server's IP (cannot be set to 0.0.0.0)
Restart cassandra ./bin/cassandra -f (from within cassandra root)
Open another terminal on local machine & connect via cqlsh **.**.**.** (your server's IP) to test.
The ./bin/nodetool status address reported my localhost IP (127.0.0.1), but cqlsh remotely still worked despite that.

Unsuccessful connection on ssh to the subordinate nodes of a cluster

Executing start of services on the subordinate nodes of a cluster by means of the following command:
hadoop#one:/export/hadoop-1.0.1/bin$. ./start-all.sh
not the first time I receive result
starting namenode, logging to /export/hadoop-1.0.1/libexec/../logs/hadoop--namenode-one.out
192.168.1.10: starting datanode, logging to /export/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-datanode-myhost2.out
192.168.1.11: ssh: connect to host 192.168.1.11 port 22: Connection timed out
192.168.1.5: starting secondarynamenode, logging to /export/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-secondarynamenode-one.out
starting jobtracker, logging to /export/hadoop-1.0.1/libexec/../logs/hadoop--jobtracker-one.out
192.168.1.10: starting tasktracker, logging to /export/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-myhost2.out
192.168.1.11: ssh: connect to host 192.168.1.11 port 22: Connection timed out
How do I fix the error below:
ssh: connect to host port 22: Connection timed out
hadoop#one: /export/hadoop-1.0.1/bin$ ssh -vvv 192.168.1.10
Sun_SSH_1.5, SSH protocols 1.5/2.0, OpenSSL 0x1000004f
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Rhosts Authentication disabled, originating port will not be trusted.
debug1: ssh_connect: needpriv 0
debug1: Connecting to 192.168.1.10 [192.168.1.10] port 22
debug1: connect to address 192.168.1.10 port 22: Connection timed out
ssh: connect to host 192.168.1.10 port 22: Connection timed out
What to do to correct an error?
Do you have sshd daemon running on your machine? Your OS might come with ssh, but in order to start sshd daemon, you need to install ssh completely. By complete I mean :
ssh : The command we use to connect to remote machines - the client.
sshd : The daemon that runs on the server and allows clients to
connect to this server.
Also, make sure there is no issue with port 22. If you still face some issue try ssh with -v switch to get the complete trace.
ssh -v myhost2
You can go here for a detailed explanation of ssh.

Cassandra nodetool in standalone mode

I've got Cassandra 0.7 running in standalone mode and I'm tryin to run nodetool but I'm getting JMX exceptions. Isn't the JMX configuration required on accessing a remote server? I'm accessing my local machine.
Also why is nodetool looking for 63.251.179.13?
[rav#ubix bin]$ ./nodetool -h 127.0.0.1 flush
Error connection to remote JMX agent!
java.rmi.ConnectException: Connection refused to host: 63.251.179.13; nested exception is:
java.net.ConnectException: Connection refused
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:128)
at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source)
at javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2343)
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:296)
at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:267)
at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)
at org.apache.cassandra.tools.NodeProbe.<init>(NodeProbe.java:114)
at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:621)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at java.net.Socket.connect(Socket.java:495)
at java.net.Socket.<init>(Socket.java:392)
at java.net.Socket.<init>(Socket.java:206)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:146)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
... 10 more
Thanks,
Try nodetool with -h or --host and -p or --port as per the instructions:
-h,--host <arg> node hostname or ip address
-p,--port <arg> remote jmx agent port number
When Cassandra is offline, check the ports in use to see if another process is using the default port that Cassandra binds to. You can find the default in conf/cassandra-env.sh
Once you know the port, you can see if another process is bound to it with netstat -an
If nothing is running on the port, and you start up cassandra, verify that it is running on the correct port and try to connect again with the -p or --port arguments. More information can be found here: http://wiki.apache.org/cassandra/GettingStarted
Is the machine unix or windows? do you have a bad entry in /etc/hosts indicating that 127.0.0.1 maps to another hostname or IP address, namely 63.251.179.13
I had a similar issue running nodetool on an instance of Cassandra running locally on my machine. When trying to run nodetool -h 127.0.0.1 nodetool was issuing an exception relating to JMX that looked like this (where there was an unknown - to me - IP Address).
Error connecting to remote JMX agent!
java.rmi.ConnectIOException: Exception creating connection to: ; nested exception is:
java.net.SocketException: Host is down
Douglas Muth posted a similar issue here, and from this, I found out that Cassandra seems to be recording the hostname at startup. Unfortunately, by the time I ran nodetool the hostname had become stale (my IP address is allocated dynamically).
My solution then, was to restart cassandra, which updated the IP and rerun nodetool. No more JMX errors, no more strange IP address. This worked a treat for me as I'm running a local instance of Cassandra on localhost and don't mind the restart but it's not a very satisfactory solution.

Resources