How to restart an erlang node? - linux

A sample I have git push github
It is an example of application from the book Programming Erlang
you can do follow README.md
My question is when the application sellaprime started, and I
./bin/sp restart
this will make the node down and not restart?
Erlang Doc say
The system is restarted inside the running Erlang node, which means that the emulator is not restarted. All applications are taken down smoothly, all code is unloaded, and all ports are closed before the system is booted again in the same way as initially started. The same BootArgs are used again.
What does "emulator is not restarted" mean?
If I want to restart a node, what is the right way to do?
By the way, is there any API to know the current release version, like
application:which_applications()

It looks like your sb init script, that is using the nodetool script should call init:restart() for you. If this is done, but your node is instead shut down, check your logs for any possible errors (perhaps one of your applications cannot handle a restart?).
Using init:restart() is the way to do it though. Here's an example: start an Erlang node with a name (in this case, test):
$ erl -sname test
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V7.0 (abort with ^G)
(test#host)1> hello.
hello
(test#host)2>
Temporary start another node that will make an RPC call to the first node:
$ erl -sname other -noinput -noshell -eval "rpc:call('test#host', init, restart, [])" -s init stop
$
Observer the original node being restarted:
(test#host)2> Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V7.0 (abort with ^G)
(test#host)1>

Related

RIAK Node does not Start after changing IP

I am in the process of setting up a Riak Cluster on Raspberry Pis.
Unfortunately I get the following error message after changing the IP address.
Versions I used:
Debian Jessie (Raspberry PI)
riak (Github Clone Mar2017)
riak-cs2.1.1
stanchion-2.1.1
Using this guide I tried to change the IP addresses in the various .conf files.
https://docs.riak.com/riak/kv/latest/using/cluster-operations/changing-cluster-info/index.html
Works on 127.0.0.1:
$ ~/riak/rel/riak/bin/riak-admin test
Successfully completed 1 read/write cycle to 'riak#127.0.0.1'
Error Message (after changing IP:192.168.178.61):
sudo ./riak console
config is OK
-config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args
Exec: /home/pi/neu/riak/rel/riak/bin/../erts-5.10.3/bin/erlexec -boot /home/pi/neu/riak/rel/riak/bin/../releases/2.2.3/riak -config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -pa /home/pi/neu/riak/rel/riak/bin/../lib/basho-patches -- console
Root: /home/pi/neu/riak/rel/riak/bin/..
Erlang R16B02_basho10 (erts-5.10.3) [source] [smp:4:4] [async-threads:64] [hipe] [kernel-poll:true] [frame-pointer]
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_capability,renegotiate_capabilities,1,[{file,\"src/riak_core_capability.erl\"},{line,441}]},{riak_core_capability,handle_call,3,[{file,\"src/riak_core_capability.erl\"},{line,213}]},{gen_server,handle_msg,5,[{file,\"gen_server.erl\"},{line,585}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]},{gen_server,call,[riak_core_capability,{register,{riak_core,vnode_routing},{capability,[proxy,legacy],legacy,{riak_core,legacy_vnode_routing,[{true,legacy},{false,proxy}]}}},infinity]}}}}}}"}
Crash dump was written to: ./log/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[
https://github.com/basho/riak/issues/999
martinsumner commented 3 days ago:
I might expect to see this if you hadn't done the step of either renaming (or deleting the contents of) the ring directory. Did you do this?
Also can you confirm if you're in the single-node or multi-node renaming scenario?
Ei3rb0mb3r commented 1 minute ago:
Many thanks for the quick feedback!
The error has been solved after I deleted the ring directory files.
../riak/rel/riak/data/ring/ rm -rf *

Spark master and worker seem to run on different JVM version

In standalone mode master process uses /usr/bin/java which resolves to JVM 1.8 and worker process /usr/lib/jvm/java/bin/java which resolves to 1.7. In my Spark application I'm using some APIs introduced in 1.8.
Looking at stack trace one line that comes up is: Caused by: java.lang.NoClassDefFoundError: Could not initialize class SomeClassDefinedByMe which internally creates instance from java.time which I believe is only in JDK 1.8.
How do I force worker to use JVM 1.8?
Update:
For now I renamed /usr/lib/jvm/java/bin/java and created a link that points to /usr/bin/java. This solved the problem but still would like to know why both processes use different binary location and where is this set.
On each Worker node, edit ${SPARK_HOME}/conf/spark-env.sh and define the appropriate $JAVA_HOME e.g.
export JAVA_HOME=/usr/bin/java
That file is sourced by ${SPARK_HOME}/bin/load-spark-env.sh which is invoked by each and every Spark command-line utility:
${SPARK_HOME}/bin/spark-shell via ${SPARK_HOME}/bin/spark-class
${SPARK_HOME}/bin/spark-submit via ${SPARK_HOME}/bin/spark-class
...
${SPARK_HOME}/sbin/start-slave.sh
...
Side note: the Linux alternatives are the standard way to define which JVM is on top of your PATH...
Typical setup with a "fixed" setting, not relying on the priority set by the OpenJDK RPM install:
$ ls -AFl $(which java)
lrwxrwxrwx. 1 root root 22 Feb 15 16:06 /usr/bin/java -> /etc/alternatives/java*
$ alternatives --display java | grep -v slave
java - status is manual.
link currently points to /usr/java/jdk1.8.0_92/jre/bin/java
/usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java - priority 18091
/usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java - priority 16000
/usr/java/jdk1.8.0_92/jre/bin/java - priority 18092
Current `best' version is /usr/java/jdk1.8.0_92/jre/bin/java.
...provided that $PATH is defined properly for the Linux account that launches the Spark slaves!

I am unable to start ejabberd on ubuntu

The error message is
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,net_kernel,{'EXIT',nodistribution}}}}},{k
System version: Erlang R16B03 (erts-5.10.4) [source] [64-bit] [async-threads:10] [kernel-poll:false]
Please help me to get solved.
Thanks in advance.
try to read this issue . It helped me in similar case

abrtd: Node Process was killed by signal 6 (SIGABRT)

I am running a Node program that does a long running data migration job. After an hour is process, Node process terminates by Abrt daemon and creates core dump.
Looking into the reason I see this:
node process was killed by signal 6 (SIGABRT)
Any ideas why Node process is killed and how to deal with it?
It turned out to be MemoryLeak issue in Strong-Oracle module I am using. I have increased Nodejs process memory to run with 4G memory. Working fine now.

PHP exec(myexe) fails in PHP App, but not CLI. Fails Running Under User "apache"

I have a custom program (e.g. myexe) being executed by a web app using PHP's exec() function. It does not fail when run using the PHP CLI nor does myexe fail when run from the command line with me as a user. I have built myexe so that there are no memory issues when profiled using valgrind. myexe is about 26MB in size.
To simplify the situation, I have run myexe on the command line under the user 'apache' and reproduced the failure.
su -s /bin/sh apache -c "/usr/local/bin/myexe parm1 parm2..."
==> Segmentation fault (core dumped)
BUT when I change the user to myself and run the same command above, it works.
su -s /bin/sh mike -c "/usr/local/bin/myexe parm1 parm2..."
==> WORKS
Here's the error from the system log file:
Jul 9 18:26:15 DEVSTN-1 kernel: myexe[27352]: segfault at 7fffa2bf9ff8 ip 0000000000410324 sp 00007fffa2bfa000 error 6 in myexe[400000+5ae000]
Jul 9 18:26:16 DEVSTN-1 abrt[27353]: Saved core dump of pid 27352 (/usr/local/bin/myexe) to /var/spool/abrt/ccpp-2015-07-09-18:26:15-27352 (13631488 bytes)
Jul 9 18:26:16 DEVSTN-1 abrtd: Directory 'ccpp-2015-07-09-18:26:15-27352' creation detected
Jul 9 18:26:17 DEVSTN-1 abrtd: Executable '/usr/local/bin/myexe' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Jul 9 18:26:17 DEVSTN-1 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2015-07-09-18:26:15-27352' exited with 1
Jul 9 18:26:17 DEVSTN-1 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2015-07-09-18:26:15-27352'
My configuration:
CentOS6 2.6.32-504.23.4.el6.x86_64
Apache/2.2.15 (CentOS)
PHP Version 5.3.3
Am I correct with assuming that PHP has nothing to do with the error?
What should I do next?
Correct; PHP has nothing to do with the error. This is a segmentation fault caused by invalid memory access (either overflowing a buffer, or accessing already-freed memory) in myexe. It seems to have saved a core dump to /var/spool/abrt/ccpp-2015-07-09-18:26:15-27352, so, try debugging with GDB:
gdb /usr/local/bin/myexe -c /var/spool/abrt/ccpp-2015-07-09-18:26:15-27352
(gdb) bt
And try to see where the executable is failing. To get useful output, it will need to be compiled with debugging symbols. If it doesn't fail running as root or a different user, or running in an interactive terminal, I'd look for bugs that could be triggered by being unable to open a file, unable to read an expected environment variable, etc. to help isolate your problem.
Running the executable under strace might help figure out what's going on as well.
Found the problem by entering a bash shell user user apache and running the program using gdb.
Turns out myexe was trying to create a directory under the user's home dir (/home/apache) which doesn't exist.
What helped me was knowing how to start a shell under a different user and using gdb.
Here's the command to start a shell under another user (apache):
su -s /bin/bash apache

Resources