How to check status of Spark (Standalone) services on cloudera-quickstart-vm?

How to check status of Spark (Standalone) services on cloudera-quickstart-vm? - apache-spark

I am trying to get the status of the services namely spark-master and spark-slaves running on Spark (standalone) service running on my local vm
However running sudo service spark-master status is not working.
Can anybody provide some hints on how to check the status of Spark services?

I use jps -lm as the tool to get status of any JVMs on a box, Spark's ones including. Consult jps documentation for more details beside -lm command-line options.
If you however want to filter out the JVM processes that really belong to Spark you should pipe it and use OS-specific tools like grep.
➜ spark git:(master) ✗ jps -lm
999 org.apache.spark.deploy.master.Master --ip japila.local --port 7077 --webui-port 8080
397
669 org.jetbrains.idea.maven.server.RemoteMavenServer
1198 sun.tools.jps.Jps -lm
➜ spark git:(master) ✗ jps -lm | grep -i spark
999 org.apache.spark.deploy.master.Master --ip japila.local --port 7077 --webui-port 8080
You can also check out ./sbin/spark-daemon.sh status, but my limited understanding of the tool doesn't make it a recommended one.
When you start Spark Standalone using scripts under sbin, PIDs are stored in /tmp directory by default. ./sbin/spark-daemon.sh status can read them and do the "boilerplate" for you, i.e. status a PID.
➜ spark git:(master) ✗ jps -lm | grep -i spark
999 org.apache.spark.deploy.master.Master --ip japila.local --port 7077 --webui-port 8080
➜ spark git:(master) ✗ ls /tmp/spark-*.pid
/tmp/spark-jacek-org.apache.spark.deploy.master.Master-1.pid
➜ spark git:(master) ✗ ./sbin/spark-daemon.sh status org.apache.spark.deploy.master.Master 1
org.apache.spark.deploy.master.Master is running.

ps -ef | grep spark works with details of all pids

Related

Docker volume not storing entire Data

I am running activeMQ in Docker and persisting data with following command . but data is not persisted in a directory completely
[root#kubernetes-4 activemq]# docker run -it --rm -v /MyPath:/data webcenter/activemq:5.14.3
2019-08-24 03:56:42,388 CRIT Supervisor running as root (no user in config file)
2019-08-24 03:56:42,388 WARN Included extra file "/etc/supervisor/conf.d/activemq.conf" during parsing
2019-08-24 03:56:42,388 WARN Included extra file "/etc/supervisor/conf.d/cron.conf" during parsing
2019-08-24 03:56:42,400 INFO RPC interface 'supervisor' initialized
2019-08-24 03:56:42,400 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2019-08-24 03:56:42,400 INFO supervisord started with pid 1
2019-08-24 03:56:43,403 INFO spawned: 'cron' with pid 16
2019-08-24 03:56:43,405 INFO spawned: 'activemq' with pid 17
2019-08-24 03:56:44,969 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-08-24 03:56:44,969 INFO success: activemq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Data in My Path
cd /MyPath/
[root#kubernetes-4 MyPath]# ls
activemq
[root#kubernetes-4 MyPath]# cd activemq/
[root#kubernetes-4 activemq]# ls
[root#kubernetes-4 activemq]#
Data In Container
root#kubernetes-4 activemq]# docker exec -it 630a3fe743e2 bash
root#630a3fe743e2:/opt/activemq# ls
LICENSE README.txt bin conf.tmp docs lib webapps
NOTICE activemq-all-5.14.3.jar conf data examples tmp webapps-demo
root#630a3fe743e2:/opt/activemq# cd data
root#630a3fe743e2:/opt/activemq/data# ls
activemq.log
root#630a3fe743e2:/opt/activemq/data# cd ..
root#630a3fe743e2:/opt/activemq# cd /data/
root#630a3fe743e2:/data# ls
activemq
root#630a3fe743e2:/data# cd activemq/
root#630a3fe743e2:/data/activemq# ls
activemq.pid kahadb localhost
root#630a3fe743e2:/data/activemq#

Mount the exact activemq direcotry and you will see everything working fine.
docker run -dit --rm -v /MyPath:/data/activemq webcenter/activemq:5.14.3
Second thing, you are not mounting /opt/activemq/data so will not be accessible on the host.
For logs purpose, you can mount v /var/log/activemq:/var/log/activemq

Spark uses Random Port even after defining executor port

I have a small cluster setup for my development purpose, which contains 3 VMs with spark 2.3 installed on all the VMs. I have started the master in VM1 and slaves with master Ipaddress in other 2 Vms. we have Firewall up in all the 3 Vms and opened the port range from 38001:38113 in the firewall
Before starting the VMs we have the following configurations Done.
In Master, Worker 1 & Worker 2 Nodes
Spark-default.conf file was added with the following properties:
spark.blockManager.port 38001
spark.broadcast.port 38018
spark.driver.port 38035
spark.executor.port 38052
spark.fileserver.port 38069
spark.replClassServer.port 38086
spark.shuffle.service.port 38103
In Worker 1 & Worker 2 Nodes
Spark-env.sh file was added with the following properties:
SPARK_WORKER_PORT=38112 -- for worker-1
SPARK_WORKER_PORT=38113 -- for worker-2
When we started the Spark-shell and executed a sample csv file read, the executor started on the Worker is starting with a random port for spark driver.
E.g:
Spark Executor Command: "/usr/java/jdk1.8.0_171-amd64/jre/bin/java" "-cp" "/opt/spark/2.3.0/conf/:/opt/spark/2.3.0/jars/*" "-Xmx1024M" "-Dspark.driver.port=34573" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#293.72.254.89:34573" "--executor-id" "1" "--hostname" "293.72.146.384" "--cores" "4" "--app-id" "app-20180706072052-0000" "--worker-url" "spark://Worker#293.72.146.384:38112"
As you can see in the above command the executor started with Spark.driver.port with 34573. And this is always starting randomly. Because of this my program fails as it is unable to communicate with the driver.
Can anyone help me with this configuration which can be used to execute in network tight environment where All the ports are blocked.
Thanks in advance.

Start worker:
./start-slave.sh spark://hostname:port -p [Worker Port]
Options:
-c CORES, --cores CORES Number of cores to use
-m MEM, --memory MEM Amount of memory to use (e.g. 1000M, 2G)
-d DIR, --work-dir DIR Directory to run apps in (default: SPARK_HOME/work)
-i HOST, --ip IP Hostname to listen on (deprecated, please use --host or -h)
-h HOST, --host HOST Hostname to listen on
-p PORT, --port PORT Port to listen on (default: random)
--webui-port PORT Port for web UI (default: 8081)
--properties-file FILE Path to a custom Spark properties file.
Default is conf/spark-defaults.conf.

Failed to connect to containerd: failed to dial

Just installed Docker CE following official instructions with the repository in Ubuntu 14.04
Installation went successfully, the daemon is running
$ ps aux | grep docker
[...] /usr/bin/dockerd --raw-logs [...]
My user is in the docker group:
$ groups
[...] docker
The cli can't seem to communicate (same with sudo)
$ docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock.
Is the docker daemon running?
The socket seems to have the correct permissions:
$ ls -l /var/run/docker.sock
srw-rw---- 1 root docker 0 Feb 4 16:21 /var/run/docker.sock
The log seems to claim about some issues though
$ sudo tail -f /var/log/upstart/docker.log
Failed to connect to containerd: failed to dial "/var/run/docker/containerd/docker-containerd.sock": dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout
/var/run/docker.sock is up
time="2018-02-04T16:22:21.031459040+01:00" level=info msg="libcontainerd: started new docker-containerd process" pid=17147
INFO[0000] starting containerd module=containerd revision=89623f28b87a6004d4b785663257362d1658a729 version=v1.0.0
INFO[0000] setting subreaper... module=containerd
containerd: invalid argument
time="2018-02-04T16:22:21.056685023+01:00" level=error msg="containerd did not exit successfully" error="exit status 1" module=libcontainerd
Any advice to make this work ?
Relog and Docker restart already done of course

As #bobbear suggested and is actually mentioned in the official doc one of the prerequisites is:
Version 3.10 or higher of the Linux kernel. The latest version of the kernel available for you platform is recommended.
After having checked my Kernel version:
$ uname -a
Linux [...] 3.2.[...]-generic [...]-Ubuntu [...] x86_64
I searched for candidates:
$ apt-cache search linux-image
And installed my new_kernel:
$ sudo apt-get install \
linux-image-new_kernel \
linux-headers-new_kernel \
linux-image-extra-new_kernel

Same situation happend on me. IS because your linux kernel version too low !!! check it use command "uname -r" , if the version below "3.10" (for example: debian 7 whezzy default version is 3.2 ) ,even you install docker-ce suceessfully, you will still can not start docker daemon success.That why! All most answers on the web tell you to 'restart' bla bla bla... but they did not consider this problem.

memsql-deploy leaf node consistently failed

On the same host as master to memsql-deploy leaf node always failed with same error. Switching the operation to new machines has the same failure.
Here is the steps to deploy master role:
# memsql-ops memsql-deploy -a Af53bfb -r master -P 3306 --community-edition
2017-03-24 16:15:54: Je5725b [INFO] Deploying MemSQL to 172.17.0.3:3306
2017-03-24 16:15:59: Je5725b [INFO] Installing MemSQL
2017-03-24 16:16:02: Je5725b [INFO] Finishing MemSQL Install
Waiting for MemSQL to start...
MemSQL successfully started
Here is the immediate steps to add leaf node after deploying master:
# memsql-ops memsql-deploy -a Af53bfb -r leaf -P 3308
2017-03-24 16:16:43: J32c71f [INFO] Deploying MemSQL to 172.17.0.3:3308
2017-03-24 16:16:43: J32c71f [INFO] Installing MemSQL
2017-03-24 16:16:46: J32c71f [INFO] Finishing MemSQL Install
Waiting for MemSQL to start...
MemSQL failed to start: Failed to start MemSQL:
set_mempolicy: Operation not permitted
setting membind: Operation not permitted
What can be the possible reasons behind the error messages and what way that I can follow to find out the root cause or fix?

After one day search on Google, I believe I finally locate the root cause of this error. I feel strange why no one asked before because it should be happened more often than just me.
The real cause for this issue is I installed numactl package per MemSQL's best practice suggestion on a non-NUMA machine. This would effectively let the memsql node other than the first one try to run numactl sub-command set_mempolicy to bind individual MemSQL nodes to CPUs but this command would eventually fails. And the start of the node by sub-commands memsql-start or memsql-deploy from memsql-ops will all fail.
The workaround to this is very simple, just remove the package numactl. Then everything will be fine. This workaround particularly applies to some virtualization based memsql deployments like Docker.

Can you try on the master:
memsql-ops start
memsql-ops memsql-deploy --role master -P 3306 --community-edition
On the agent:
memsql-ops start
memsql-ops follow -h <host of primary agent> -P <port of primary agent if configured to use one>
memsql-ops memsql-deploy --role leaf -P 3308 --community-edition

Is it possible that spark worker require more resource than the cluster?

I have a stand alone cluster running on one node
root 23053 1 0 Apr01 ? 00:25:00 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip ES01 --port 7077 --webui-port 8080
root 23182 1 0 Apr01 ? 00:19:30 java -cp /opt/spark-1.6.0-bin-hadoop2.6/conf/:/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/opt/hadoop-2.6.2/etc/hadoop/ -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://ES01:7077
From the ps -ef command. We can see memory for master and worker is small -xms1g -xmx1g -maxperm256m
But when submit program to the cluster you can specify that the driver use 4G and worker use 8G. Why the program running on the cluster can acquire more memory than cluster?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string