arangodb starter mode does not start - arangodb

I have d/l'd arangodb3-linux-3.9.2 from GIT on Centos 7. I created a database dir and ran the README instructions for a standalone start. The first time it runs, I get 100 failures, the key INFO log lines seem to be
... [INFO] server started component=arangodb pid=49827 type=single
... [INFO] Wait on 49827 returned component=arangodb exit-status=1 trap-cause=-1
It creates the log file, setup.json and a single8529 dir in the database dir I sped'd. Is it just taking too long to start? The whole 100 fails take about 1 or 2 seconds.
If I try to run it again with the same README instructions, the next time I get this error:
... [FATAL] Failed to run service error="open /.../single8529/data/ENGINE: no such file"
I have also tried with --starter.host 127.0.0.1 -- to simplify
Also I and can confirm that port 8529 is open

I couldn't get arangodb 'starter' according to their README to work, but this does start the server:
arangod --database.directory MYDIR --rocksdb.max-background-jobs 4

Related

How to get a basic direvent watcher working?

I have read through the direvent documentation and am trying to get a simple watch working. Since I am having so much trouble with it, I am wondering if the issue has to do with the fact that the system I am using is nixos.
Here is the simple watcher file, watcher, I've created:
watcher {
path ./dir;
command "echo $file";
}
I run it in the foreground, so I can see the output, with direvent --foreground watcher. Once it's running, I create a file in dir, thus creating an event for it to respond to. However, it fails with the following output:
$ direvent --foreground watcher
direvent: [INFO] direvent 5.2 started
direvent: [ERROR] process 8552 failed with status 127
direvent: [ERROR] process 8555 failed with status 127
direvent: [ERROR] process 8557 failed with status 127
Since 127 usually means 'command not found', I tried specifying the path to echo, i.e. running this watcher instead:
watcher {
path ./dir;
command "/run/current-system/sw/bin/echo $file";
}
Then the output still gives an error, albeit a different one:
$ direvent --foreground watcher
direvent: [INFO] direvent 5.2 started
direvent: [ERROR] process 8645 failed with status 1
direvent: [ERROR] process 8651 failed with status 1
direvent: [ERROR] process 8652 failed with status 1
So the failure is now with status 1. I am not sure what to try next. I'm wondering if this issue is due to the fact that I am running nixos. Anyone know what I might try next to get direvent working?
direvent has two other flag that may be useful for you.
--debug(-d) to give extra information.
There's also --lint(t) that check the configuration file for errors, but I suspect this isn't your issue if direvent is running.
Source: https://www.gnu.org.ua/software/direvent/manual/direvent.html

RIAK Node does not Start after changing IP

I am in the process of setting up a Riak Cluster on Raspberry Pis.
Unfortunately I get the following error message after changing the IP address.
Versions I used:
Debian Jessie (Raspberry PI)
riak (Github Clone Mar2017)
riak-cs2.1.1
stanchion-2.1.1
Using this guide I tried to change the IP addresses in the various .conf files.
https://docs.riak.com/riak/kv/latest/using/cluster-operations/changing-cluster-info/index.html
Works on 127.0.0.1:
$ ~/riak/rel/riak/bin/riak-admin test
Successfully completed 1 read/write cycle to 'riak#127.0.0.1'
Error Message (after changing IP:192.168.178.61):
sudo ./riak console
config is OK
-config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args
Exec: /home/pi/neu/riak/rel/riak/bin/../erts-5.10.3/bin/erlexec -boot /home/pi/neu/riak/rel/riak/bin/../releases/2.2.3/riak -config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -pa /home/pi/neu/riak/rel/riak/bin/../lib/basho-patches -- console
Root: /home/pi/neu/riak/rel/riak/bin/..
Erlang R16B02_basho10 (erts-5.10.3) [source] [smp:4:4] [async-threads:64] [hipe] [kernel-poll:true] [frame-pointer]
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_capability,renegotiate_capabilities,1,[{file,\"src/riak_core_capability.erl\"},{line,441}]},{riak_core_capability,handle_call,3,[{file,\"src/riak_core_capability.erl\"},{line,213}]},{gen_server,handle_msg,5,[{file,\"gen_server.erl\"},{line,585}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]},{gen_server,call,[riak_core_capability,{register,{riak_core,vnode_routing},{capability,[proxy,legacy],legacy,{riak_core,legacy_vnode_routing,[{true,legacy},{false,proxy}]}}},infinity]}}}}}}"}
Crash dump was written to: ./log/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[
https://github.com/basho/riak/issues/999
martinsumner commented 3 days ago:
I might expect to see this if you hadn't done the step of either renaming (or deleting the contents of) the ring directory. Did you do this?
Also can you confirm if you're in the single-node or multi-node renaming scenario?
Ei3rb0mb3r commented 1 minute ago:
Many thanks for the quick feedback!
The error has been solved after I deleted the ring directory files.
../riak/rel/riak/data/ring/ rm -rf *

(bdutil) Unable to get hadoop/spark cluster working with a fresh install

I'm setting up a tiny cluster in GCE to play around with it but although instances are created some failures prevent to get it working. I'm following the steps in https://cloud.google.com/hadoop/downloads
So far I'm using (as of now) lastest versions of gcloud (143.0.0) and bdutil (1.3.5), freshly installed.
./bdutil deploy -e extensions/spark/spark_env.sh
using debian-8 as image (as bdutil still uses debian-7-backports).
At some point I got
Fri Feb 10 16:19:34 CET 2017: Command failed: wait ${SUBPROC} on line 326.
Fri Feb 10 16:19:34 CET 2017: Exit code of failed command: 1
full debug output is in https://gist.github.com/jlorper/4299a816fc0b140575ed70fe0da1f272
(project id and bucket names changed)
Instances are created, but spark not even installed. Digging a bit I've managed to run spark installation and start hadoop commands in the master after after ssh. But it fails badly when starting the spark-shell:
17/02/10 15:53:20 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.5-hadoop1
17/02/10 15:53:20 INFO gcsio.FileSystemBackedDirectoryListCache: Creating '/hadoop_gcs_connector_metadata_cache' with createDirectories()...
java.lang.RuntimeException: java.lang.RuntimeException: java.nio.file.AccessDeniedException: /hadoop_gcs_connector_metadata_cache
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
and not able to import sparkSQL. For what I've read everything should be started automatically.
Up to this point I'm a bit lost and don't know what else to do.
Am I missing any step? Is any of the commands faulty? Thanks in advance.
Update: solved
As pointed out in accepted solution I cloned the repo and cluster was created without issues. When trying to start the spark-shell though it gave
java.lang.RuntimeException: java.io.IOException: GoogleHadoopFileSystem has been closed or not initialized.`
That sounded to me like connectors were not initialized properly, so after running
./bdutil --env_var_files extensions/spark/spark_env.sh,bigquery_env.sh run_command_group install_connectors
it worked as expected.
The last version of bdutil on https://cloud.google.com/hadoop/downloads is a bit stale and I'd instead recommend using the version of bdutil at head on github: https://github.com/GoogleCloudPlatform/bdutil.

Nodejs to Couchbase Server timed out

My app is actually running fine when I started it and keep an eye on it for few hours.
BUT later time(I'm not sure what exact time of inactivity), It does show "Server timed out" (I crop some logs below)
[ERROR] (server - L:463) <node.mycouchbase.server:11210> (SRV=0x2111c60,IX=4) Server timed out. Some commands have failed
[INFO] (confmon - L:166) Not applying configuration received via CCCP. No changes detected. A.rev=152466, B.rev=152466
[INFO] (cccp - L:110) Re-Issuing CCCP Command on server struct 0x2116980
[ERROR] (cccp - L:133) <NOHOST:NOPORT> Got I/O Error=0x17
[INFO] (cccp - L:110) Re-Issuing CCCP Command on server struct 0x2185e30
All just working fine again when I restart my Node.js app (Expressjs app).
This problem seem regularly happening
Please give me some suggestion what could be actually problems behind.
Thanks

Freeswitch pauses on check_ip at boot on centos 7.1

During an investigation into a different problem (Inconsistent systemd startup of freeswitch) I discovered that both the latest freeswitch 1.6 and 1.7 paused for several minutes at a time (between 4 and 14) during boot up on centos 7.1. Whilst it was intermittent, it was as often as one time in 3 or 4.
Running this from the command line :
/usr/bin/freeswitch -nonat -db /dev/shm -log /usr/local/freeswitch/log -conf /usr/local/freeswitch/conf -run /usr/local/freeswitch/run
caused the following output (note the time difference between the Add task 2 and the line after it) :
2015-10-23 15:40:14.160101 [INFO] switch_event.c:685 Activate Eventing Engine.
2015-10-23 15:40:14.170805 [WARNING] switch_event.c:656 Create additional event dispatch thread 0
2015-10-23 15:40:14.272850 [INFO] switch_core_sqldb.c:3381 Opening DB
2015-10-23 15:40:14.282317 [INFO] switch_core_sqldb.c:1693 CORE Starting SQL thread.
2015-10-23 15:40:14.285266 [NOTICE] switch_scheduler.c:183 Starting task thread
2015-10-23 15:40:14.293743 [DEBUG] switch_scheduler.c:249 Added task 1 heartbeat (core) to run at 1445611214
2015-10-23 15:40:14.293837 [DEBUG] switch_scheduler.c:249 Added task 2 check_ip (core) to run at 1445611214
2015-10-23 15:49:47.883158 [NOTICE] switch_core.c:1386 Created ip list rfc6598.auto default (deny)
When I ran it from 1.6 on centos6.7 using the same command line as above I got this - note the delay is a more reasonable 14 seconds :
2015-10-23 10:31:00.274533 [INFO] switch_event.c:685 Activate Eventing Engine.
2015-10-23 10:31:00.285807 [WARNING] switch_event.c:656 Create additional event dispatch thread 0
2015-10-23 10:31:00.434780 [INFO] switch_core_sqldb.c:3381 Opening DB
2015-10-23 10:31:00.465158 [INFO] switch_core_sqldb.c:1693 CORE Starting SQL thread.
2015-10-23 10:31:00.481306 [DEBUG] switch_scheduler.c:249 Added task 1 heartbeat (core) to run at 1445610660
2015-10-23 10:31:00.481446 [DEBUG] switch_scheduler.c:249 Added task 2 check_ip (core) to run at 1445610660
2015-10-23 10:31:00.481723 [NOTICE] switch_scheduler.c:183 Starting task thread
2015-10-23 10:31:14.286702 [NOTICE] switch_core.c:1386 Created ip list rfc6598.auto default (deny)
It's the same on FS 1.7 as well.
This suggests heavily that centos 7.1 & FS have an issue together. Can anyone help me diagnose further or shine some more light on this, please?
This all came to light as I tried to understand why FS would not accept the cli connection for several minutes after I thought it had booted up (using -nc from systemd service).
Thanks to the FS userlist and ultimately Anthony Minessale, the issue was to do with RNG entropy.
This is a good explanation -
https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-haveged
Here are some extracts :
There are two general random devices on Linux: /dev/random and
/dev/urandom. The best randomness comes from /dev/random, since it's a
blocking device, and will wait until sufficient entropy is available
to continue providing output.
The key here is that it's a blocking device, so any program waiting for a random number from /dev/random will pause until sufficient entropy is available for a "safe" random number.
This is a headless server, so the usual sources of entropy such as mouse/keyboard activity (and many others) do not apply. Thus the delays,
The fix is this :
Based on the HAVEGE principle, and previously based on its associated
library, haveged allows generating randomness based on variations in
code execution time on a processor......(google the rest!)
Install like this :
yum install haveged
and start it up like this :
haveged -w 1024
making sure it restarts on reboot :
chkconfig haveged on
Hope this helps someone.

Resources