RIAK Node does not Start after changing IP - linux

I am in the process of setting up a Riak Cluster on Raspberry Pis.
Unfortunately I get the following error message after changing the IP address.
Versions I used:
Debian Jessie (Raspberry PI)
riak (Github Clone Mar2017)
riak-cs2.1.1
stanchion-2.1.1
Using this guide I tried to change the IP addresses in the various .conf files.
https://docs.riak.com/riak/kv/latest/using/cluster-operations/changing-cluster-info/index.html
Works on 127.0.0.1:
$ ~/riak/rel/riak/bin/riak-admin test
Successfully completed 1 read/write cycle to 'riak#127.0.0.1'
Error Message (after changing IP:192.168.178.61):
sudo ./riak console
config is OK
-config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args
Exec: /home/pi/neu/riak/rel/riak/bin/../erts-5.10.3/bin/erlexec -boot /home/pi/neu/riak/rel/riak/bin/../releases/2.2.3/riak -config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -pa /home/pi/neu/riak/rel/riak/bin/../lib/basho-patches -- console
Root: /home/pi/neu/riak/rel/riak/bin/..
Erlang R16B02_basho10 (erts-5.10.3) [source] [smp:4:4] [async-threads:64] [hipe] [kernel-poll:true] [frame-pointer]
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_capability,renegotiate_capabilities,1,[{file,\"src/riak_core_capability.erl\"},{line,441}]},{riak_core_capability,handle_call,3,[{file,\"src/riak_core_capability.erl\"},{line,213}]},{gen_server,handle_msg,5,[{file,\"gen_server.erl\"},{line,585}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]},{gen_server,call,[riak_core_capability,{register,{riak_core,vnode_routing},{capability,[proxy,legacy],legacy,{riak_core,legacy_vnode_routing,[{true,legacy},{false,proxy}]}}},infinity]}}}}}}"}
Crash dump was written to: ./log/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[

https://github.com/basho/riak/issues/999
martinsumner commented 3 days ago:
I might expect to see this if you hadn't done the step of either renaming (or deleting the contents of) the ring directory. Did you do this?
Also can you confirm if you're in the single-node or multi-node renaming scenario?
Ei3rb0mb3r commented 1 minute ago:
Many thanks for the quick feedback!
The error has been solved after I deleted the ring directory files.
../riak/rel/riak/data/ring/ rm -rf *

Related

Submitting first job to pacemaker

I followed this guide:
https://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/
I stayed with the Active/Passive DRBD file system sharing. I had to reboot my cluster and now I am getting the following error:
Current DC: rbx-1 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Tue Nov 28 17:01:14 2017
Last change: Tue Nov 28 16:40:09 2017 by root via cibadmin on rbx-1
2 nodes configured
5 resources configured
Node rbx-2: UNCLEAN (offline)
Online: [ rbx-1 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started rbx-1
WebSite (ocf::heartbeat:apache): Stopped
Master/Slave Set: WebDataClone [WebData]
WebData (ocf::linbit:drbd): FAILED rbx-1 (blocked)
Stopped: [ rbx-2 ]
WebFS (ocf::heartbeat:Filesystem): Stopped
Failed Actions:
* WebData_stop_0 on rbx-1 'invalid parameter' (2): call=20, status=complete, exitreason='none',
last-rc-change='Tue Nov 28 16:27:58 2017', queued=0ms, exec=3ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Any ideas?
Also does anyone have any recommended guides for submitting jobs?
This post is relatively old at this point but I'll leave this here for others to find if they stumble upon the same issue.
This problem has to do with an issue with the DRBD integration script that pacemaker uses. If it's broken, missing, has incorrect permissions, etc. you can get an error like this. In CentOS 7 that script is located at /usr/lib/ocf/resource.d/drbd
Note: This is specifically for the guide mentioned by OP but may help you:
Section 7.1 has a big "IMPORTANT" block that talks about replacing the Pacemaker integration script due to a bug. If you use the command it tells you to there, you actually replace the script with a 404 Error page which obviously doesn't work, causing the error. You can fix this issue by replacing the script with the original, either by reinstalling DRBD...
yum remove -y kmod-drbd84 drbd84-utils
yum install -y kmod-drbd84 drbd84-utils
...or finding just the drbd script elsewhere and adding/replacing it to /usr/lib/ocf/resource.d/drbd. Make sure its permissions are correct and that it is set as executable.
Hope that helps!

Software watchdog causing system reboot if started at bootup

In my device I enabled software watchdog to monitor a file which is updated every 5 second by a application. I have configured software watchdog as below
file = /data/file_name_to_watch
change = 10
Watchdog is getting started at bootup using below command during bootup:
/usr/sbin/watchdog.sh -f -v -c watchdog.conf
Application which is responsible to update the file(file_name_to_watch) is started after watchdog deamon during bootup. File being monitored by watchdog is updated every 5 seconds by the application.
Problem is that watchdog is rebooting the system if it is started at bootup and this same problem doesn't exist when watchdog is not started at bootup but started manually after application is launched.
dmesg shows "Watchdog did not stop"
Also, changing the watchdog configuration file to below didn't help.
file = /data/file_name_to_watch
change = 20
I have checked that the file is getting updated before 10 seconds elapsed after watchdog is launched during bootup.
Any pointers to debug this problem will be appreciated.
Code which I am using for watchdog: https://layers.openembedded.org/layerindex/recipe/122/
Debugged and found the problem to be time(NULL) returning a huge number in src/file_stat.c
This was happening due to date being not set very early during bootup.

Error in VACaMobil for INET

I am starting to use VACaMobil, a module for OMNET++ which allows, while evaluating
ITS solutions, to have a constant number of cars during a simulation period.
After making some changes to the code, I tried to run VACaMobil, with the configuration
flows 2. The simulation aborted and I got the following messages from sumo-launchd.py:
jcmh#juanca-freya:~$ python Proyectos-OMNeT++/inet/etc/sumo-launchd.py -vv -c sumo-gui
Logging to /tmp/sumo-launchd.log
Listening on port 9999
Connection from 127.0.0.1 on port 49847
Handling connection from 127.0.0.1 on port 49847
Got TraCI message of length 2
Got TraCI command of length 1
Got TraCI command 0x0
Got CMD_GETVERSION
Got TraCI message of length 248
Got TraCI command of length 247
Got TraCI command 0x75
Got CMD_FILE_SEND for "sumo-launchd.launch.xml"
Got CMD_FILE_SEND with data "<launch>
<copy file="downtown.mapa.xml"/>
<copy file="downtown.routes.xml"/>
<copy file="downtown.sumo.cfg" type="config"/>
<basedir path="../examples/VACaMobil/flows/Milan/"/>
<seed value="2"/>
</launch>
"
Creating temporary directory...
Temporary dir is /tmp/sumo-launchd-tmp-dsJ7kR
Base dir is ../examples/VACaMobil/flows/Milan/
Seed is 2
Finding free port number...
Claiming lock on port
...found port 56172
Releasing lock on port
Cleaning up
Result: "None"
Aborting on error: file "../examples/VACaMobil/flows/Milan/downtown.mapa.xml" does not exist
Closing connection from 127.0.0.1 on port 49847
A user posted a message in the blog of Sergio Tornell, one of the VACaMobil developers, asking to help,
because he had a problem similar to mine. The answer was: "It seems like you are using windows. You
have to modify all the routes, in windows you use "\" instead of "/"". He certainly was using Windows,
but I am in GNU/Linux.
What could be the problem? I don't think to be because backslashes, since I am in GNU/Linux.
I am using Elementary OS 3 (based in Ubuntu 14.04), OMNeT++ 4.6, VACaMobil for INET framework and SUMO 0.18.0.
Thanks in advance.
The way to solve this problem was simply:
To go to folder where was sumo-launchd.py:
jcmh#juanca-freya:~$ cd Proyectos-OMNeT++/inet/etc
Run sumo-launchd.py directly:
jcmh#juanca-freya:~/Proyectos-OMNeT++/inet/etc$ python sumo-launchd.py -vv -c sumo-gui

riak won't start after changing nodename

I've just installed riak on my virtual machine (Ubuntu 14.04).
The problem is that when I've edited /etc/riak/riak.conf from:
nodename = riak#127.0.0.1
to
nodename = riak#10.20.0.110
which is ip of my virtual machine, service won't start. I got following error.
root#ubuntu-amd64:/home/ubuntu# riak console
config is OK
-config /var/lib/riak/generated.configs/app.2015.07.20.09.57.02.config -args_file /var/lib/riak/generated.configs/vm.2015.07.20.09.57.02.args -vm_args /var/lib/riak/generated.configs/vm.2015.07.20.09.57.02.args
Exec: /usr/lib/riak/erts-5.10.3/bin/erlexec -boot /usr/lib/riak/releases/2.1.0/riak -config /var/lib/riak/generated.configs/app.2015.07.20.09.57.02.config -args_file /var/lib/riak/generated.configs/vm.2015.07.20.09.57.02.args -vm_args /var/lib/riak/generated.configs/vm.2015.07.20.09.57.02.args -pa /usr/lib/riak/lib/basho-patches -- console
Root: /usr/lib/riak
Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:1:1] [async-threads:64] [kernel-poll:true] [frame-pointer]
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,[riak#localhost,[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_capability,renegotiate_capabilities,1,[{file,\"src/riak_core_capability.erl\"},{line,441}]},{riak_core_capability,handle_call,3,[{file,\"src/riak_core_capability.erl\"},{line,213}]},{gen_server,handle_msg,5,[{file,\"gen_server.erl\"},{line,585}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]},{gen_server,call,[riak_core_capability,{register,{riak_core,vnode_routing},{capability,[proxy,legacy],legacy,{riak_core,legacy_vnode_routing,[{true,legacy},{false,proxy}]}}},infinity]}}}}}}"}
Crash dump was written to: /var/log/riak/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,[riak#localhost,[]],[{fi
I've noticed that nodename variable is very sensitive, it won't work even with such configuration:
nodename = riak#localhost
Of course I've changed
listener.http.internal = 10.20.0.110:8098
listener.protobuf.internal = 10.20.0.110:8087
as well.
/var/log/riak/erl_crash.dump
I found solution in this post.
sudo rm -rf /var/lib/riak/ring/* # delete the riak ring

how to secure a MyCloud? Is it already too late?

My laptop is running Ubuntu 14.04 LTS. I have a WMD MyCloud that I am doing backups to with an rsync. The rsync calls usually end with an error.
Some things I have observed. The MyCloud machine has a REST API and I see that someone has tried to hack that. As far as I can tell, the attempt did not succeed.
My backup does this from my laptop, to the MyCloud:
/usr/bin/rsync -a -z -v /home/me/ root#192.168.1.82:/shares/me
Usually I get:
rsync: connection unexpectedly closed (93239 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [sender=3.1.0]
and the rsync process is returning 12.
Oddly, if I use a second -v, the process usually succeeds. Why would that be? It is too fast without the second -v?
Something I saw worries me. Again, this is the output on my laptop, running rsync to copy to the MyCloud:
*** stack smashing detected ***: <unknown> terminated
rsync: connection unexpectedly closed (11387 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [sender=3.1.0]
and it returns 12. Ouch!
So, does this mean my laptop is infected with something, presumably Heartbleed or something like it?
On my laptop:
$ rsync -h
rsync version 3.1.0 protocol version 31
...
On the MyCloud:
# rsync -h
rsync version 3.0.9 protocol version 30
...
Am I just screwed? Do I just need to update the stuff on the MyCloud? I have updated by Ubuntu laptop several times. Did that not prevent an infection on this machine?
Open to any suggestions.

Resources