DRBL cluster with Open MPI 1.8.4 - linux

When virtual/diskless node is used on DRBL cluster using Open MPI version 1.8.4, the error occurs:
Error: unknown option "--hnp-topo-sig"
I guess something with the topology signature and looks new. Any suggestions?
Typical command:
mpirun --machinefile machines -np 4 mpi_hello
machinefile: node1 slots = 4
Thank you in advance

This suggests that you are running different mpi versions on the nodes. You can confirm if this is the case by ssh'ing into each node and running 'mpirun --version'

Related

RIAK Node does not Start after changing IP

I am in the process of setting up a Riak Cluster on Raspberry Pis.
Unfortunately I get the following error message after changing the IP address.
Versions I used:
Debian Jessie (Raspberry PI)
riak (Github Clone Mar2017)
riak-cs2.1.1
stanchion-2.1.1
Using this guide I tried to change the IP addresses in the various .conf files.
https://docs.riak.com/riak/kv/latest/using/cluster-operations/changing-cluster-info/index.html
Works on 127.0.0.1:
$ ~/riak/rel/riak/bin/riak-admin test
Successfully completed 1 read/write cycle to 'riak#127.0.0.1'
Error Message (after changing IP:192.168.178.61):
sudo ./riak console
config is OK
-config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args
Exec: /home/pi/neu/riak/rel/riak/bin/../erts-5.10.3/bin/erlexec -boot /home/pi/neu/riak/rel/riak/bin/../releases/2.2.3/riak -config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -pa /home/pi/neu/riak/rel/riak/bin/../lib/basho-patches -- console
Root: /home/pi/neu/riak/rel/riak/bin/..
Erlang R16B02_basho10 (erts-5.10.3) [source] [smp:4:4] [async-threads:64] [hipe] [kernel-poll:true] [frame-pointer]
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_capability,renegotiate_capabilities,1,[{file,\"src/riak_core_capability.erl\"},{line,441}]},{riak_core_capability,handle_call,3,[{file,\"src/riak_core_capability.erl\"},{line,213}]},{gen_server,handle_msg,5,[{file,\"gen_server.erl\"},{line,585}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]},{gen_server,call,[riak_core_capability,{register,{riak_core,vnode_routing},{capability,[proxy,legacy],legacy,{riak_core,legacy_vnode_routing,[{true,legacy},{false,proxy}]}}},infinity]}}}}}}"}
Crash dump was written to: ./log/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[
https://github.com/basho/riak/issues/999
martinsumner commented 3 days ago:
I might expect to see this if you hadn't done the step of either renaming (or deleting the contents of) the ring directory. Did you do this?
Also can you confirm if you're in the single-node or multi-node renaming scenario?
Ei3rb0mb3r commented 1 minute ago:
Many thanks for the quick feedback!
The error has been solved after I deleted the ring directory files.
../riak/rel/riak/data/ring/ rm -rf *

Spark master and worker seem to run on different JVM version

In standalone mode master process uses /usr/bin/java which resolves to JVM 1.8 and worker process /usr/lib/jvm/java/bin/java which resolves to 1.7. In my Spark application I'm using some APIs introduced in 1.8.
Looking at stack trace one line that comes up is: Caused by: java.lang.NoClassDefFoundError: Could not initialize class SomeClassDefinedByMe which internally creates instance from java.time which I believe is only in JDK 1.8.
How do I force worker to use JVM 1.8?
Update:
For now I renamed /usr/lib/jvm/java/bin/java and created a link that points to /usr/bin/java. This solved the problem but still would like to know why both processes use different binary location and where is this set.
On each Worker node, edit ${SPARK_HOME}/conf/spark-env.sh and define the appropriate $JAVA_HOME e.g.
export JAVA_HOME=/usr/bin/java
That file is sourced by ${SPARK_HOME}/bin/load-spark-env.sh which is invoked by each and every Spark command-line utility:
${SPARK_HOME}/bin/spark-shell via ${SPARK_HOME}/bin/spark-class
${SPARK_HOME}/bin/spark-submit via ${SPARK_HOME}/bin/spark-class
...
${SPARK_HOME}/sbin/start-slave.sh
...
Side note: the Linux alternatives are the standard way to define which JVM is on top of your PATH...
Typical setup with a "fixed" setting, not relying on the priority set by the OpenJDK RPM install:
$ ls -AFl $(which java)
lrwxrwxrwx. 1 root root 22 Feb 15 16:06 /usr/bin/java -> /etc/alternatives/java*
$ alternatives --display java | grep -v slave
java - status is manual.
link currently points to /usr/java/jdk1.8.0_92/jre/bin/java
/usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java - priority 18091
/usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java - priority 16000
/usr/java/jdk1.8.0_92/jre/bin/java - priority 18092
Current `best' version is /usr/java/jdk1.8.0_92/jre/bin/java.
...provided that $PATH is defined properly for the Linux account that launches the Spark slaves!

(bdutil) Unable to get hadoop/spark cluster working with a fresh install

I'm setting up a tiny cluster in GCE to play around with it but although instances are created some failures prevent to get it working. I'm following the steps in https://cloud.google.com/hadoop/downloads
So far I'm using (as of now) lastest versions of gcloud (143.0.0) and bdutil (1.3.5), freshly installed.
./bdutil deploy -e extensions/spark/spark_env.sh
using debian-8 as image (as bdutil still uses debian-7-backports).
At some point I got
Fri Feb 10 16:19:34 CET 2017: Command failed: wait ${SUBPROC} on line 326.
Fri Feb 10 16:19:34 CET 2017: Exit code of failed command: 1
full debug output is in https://gist.github.com/jlorper/4299a816fc0b140575ed70fe0da1f272
(project id and bucket names changed)
Instances are created, but spark not even installed. Digging a bit I've managed to run spark installation and start hadoop commands in the master after after ssh. But it fails badly when starting the spark-shell:
17/02/10 15:53:20 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.5-hadoop1
17/02/10 15:53:20 INFO gcsio.FileSystemBackedDirectoryListCache: Creating '/hadoop_gcs_connector_metadata_cache' with createDirectories()...
java.lang.RuntimeException: java.lang.RuntimeException: java.nio.file.AccessDeniedException: /hadoop_gcs_connector_metadata_cache
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
and not able to import sparkSQL. For what I've read everything should be started automatically.
Up to this point I'm a bit lost and don't know what else to do.
Am I missing any step? Is any of the commands faulty? Thanks in advance.
Update: solved
As pointed out in accepted solution I cloned the repo and cluster was created without issues. When trying to start the spark-shell though it gave
java.lang.RuntimeException: java.io.IOException: GoogleHadoopFileSystem has been closed or not initialized.`
That sounded to me like connectors were not initialized properly, so after running
./bdutil --env_var_files extensions/spark/spark_env.sh,bigquery_env.sh run_command_group install_connectors
it worked as expected.
The last version of bdutil on https://cloud.google.com/hadoop/downloads is a bit stale and I'd instead recommend using the version of bdutil at head on github: https://github.com/GoogleCloudPlatform/bdutil.

cassandra installation failed

I have been trying to install datastax c* and getting stuck at the below line. It doesn't go forward after this line. May I know what the issue can be?
NFO [main] 2016-02-01 11:09:01,032 CassandraDaemon.java:205 - JVM Arguments: [-Ddse.system_memory_in_mb=991, -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader, -Ddse.system_memory_in_mb=991, -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader, -ea, -javaagent:/usr/share/dse/cassandra/lib/jamm-0.3.0.jar, -XX:+UseThreadPriorities, -XX:ThreadPriorityPolicy=42, -Xms495M, -Xmx495M, -XX:+HeapDumpOnOutOfMemoryError, -Xss256k, -XX:+AlwaysPreTouch, -XX:-UseBiasedLocking, -XX:StringTableSize=1000003, -XX:+UseTLAB, -XX:+ResizeTLAB, -XX:CompileCommandFile=/etc/dse/cassandra/hotspot_compiler, -XX:+UseG1GC, -XX:G1RSetUpdatingPauseTimePercent=5, -XX:MaxGCPauseMillis=500, -Djava.net.preferIPv4Stack=true, -Dcassandra.jmx.local.port=7199, -XX:+DisableExplicitGC, -Dlogback.configurationFile=logback.xml, -Dcassandra.logdir=/var/log/cassandra, -Dcassandra.storagedir=, -Dcassandra-pidfile=/var/run/dse/dse.pid, -Dsearch-service=true, -Dcatalina.home=/usr/share/dse/tomcat, -Dcatalina.base=/usr/share/dse/tomcat, -Djava.util.logging.config.file=/usr/share/dse/tomcat/conf/logging.properties, -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager, -Dtomcat.logs=/var/log/tomcat, -XX:HeapDumpPath=/var/lib/cassandra/java_1454342934.hprof, -XX:ErrorFile=/var/lib/cassandra/hs_err_1454342934.log, -Djava.library.path=:/usr/share/dse/hadoop/native/Error:_JAVA_HOME_is_not_set./lib:/usr/share/dse/hadoop/native/Error:_JAVA_HOME_is_not_set./lib, -Dsolr.solr.home=solr/, -Ddse.system_memory_in_mb=991, -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader, -Ddse.system_memory_in_mb=991, -Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader]
I see exactly the same issue when starting DSE in a Vagrant VM (CentOS 7) that does not have enough RAM allocated - are you running in Vagrant / a VM, or on hardware with limited memory?
If you set the ram to 2096 or higher, you should see DSE start up successfully.
DataStax is pretty resource intensive, though it's unfortunate the error messages aren't more helpful here!
(The tell-tale symptom here is Error:_JAVA_HOME_is_not_set in the command line)

How can I create a local multi-node Cassandra cluster on Windows 7 64 bit?

I am looking for a set of instructions to create a local multi-node Cassandra 2.x cluster on a Window 7 64 bit PC.
It should preferably use CCM “Cassandra Cluster Manager” and allow management using DataStax OpsCenter
I followed the instructions in “Getting Started with Apache Cassandra on Windows the Easy Way” but they are for a single node cluster.
EDIT: I got stuck on deploying OpsCenter agents on each node using CCM, any ideas?
Articles used for this tutorial:
CCM 2.0 and Windows
Cassandra Wiki - Windows Development
Setting up a multi-node Cassandra cluster on a single Windows
machine
See also:
Getting Started with Apache Cassandra on Windows the Easy Way
Cassandra DevCenter (free registration required)
Prerequisites:
The following tools are assumed to be already installed:
JDK 7 or older
ANT build tool
Step 1: Install Python
Download and install latest version of Python 2.x from here
e.g. https://www.python.org/ftp/python/2.7.11/python-2.7.11.amd64.msi
Note: This will also install “pip” tool
The following directories need to be added to the PATH
<PYTHON_INSTALL_DIR>\Python
<PYTHON_INSTALL_DIR>\Python\Scripts
<ANT_INSTALL_DIR>\bin
Step 2: Install CCM “Cassandra Cluster Manager”
In a new Command Prompt/Powershell window (login as yourself)
type “pip install ccm” – which will automatically download and install ccm
> pip install ccm
Collecting ccm
Downloading ccm-2.0.6.tar.gz (56kB)
100% |################################| 57kB 1.8MB/s
Collecting pyYaml (from ccm)
Downloading PyYAML-3.11.tar.gz (248kB)
100% |################################| 249kB 1.7MB/s
Collecting six>=1.4.1 (from ccm)
Downloading six-1.10.0-py2.py3-none-any.whl
Installing collected packages: pyYaml, six, ccm
Running setup.py install for pyYaml
Running setup.py install for ccm
Successfully installed ccm-2.0.6 pyYaml-3.11 six-1.10.0
Step 3: Install “psutil (python system and process utilities)”
In the same window as for Step 2:
type “pip install psutil” – – which will automatically download and install psutil
> pip install psutil
Collecting psutil
Downloading psutil-3.3.0-cp27-none-win_amd64.whl (92kB)
100% |################################| 94kB 1.4MB/s
Installing collected packages: psutil
Successfully installed psutil-3.3.0
Note: This window can now be closed
Step 4: Set-ExecutionPolicy Unrestricted
In a new Powershell window (login as local admin), type “Set-ExecutionPolicy Unrestricted”
*Note: You must set the execution policy of Windows Powershell to allow CCM to launch instances of Cassandra. An unrestricted execution policy will also allow CCM to run on the regular command prompt (cmd) as well as Windows Powershell
PS C:\Windows\system32> Set-ExecutionPolicy Unrestricted
Execution Policy Change
The execution policy helps protect you from scripts that you do not trust. Changing the execution policy might expose
you to the security risks described in the about_Execution_Policies help topic. Do you want to change the execution
policy?
[Y] Yes [N] No [S] Suspend [?] Help (default is "Y"): Y
Step 5: Register PY extension
Note: Add .PY extension to environment variable $PATHEXT, to allow ccm to be executed from any location (run on PowerShell as administrator):
In the same window as for Step 4 type
[Environment]::SetEnvironmentVariable("PATHEXT", "$env:PATHEXT;.PY", "MACHINE")
Note: This window can now be closed
Step 6: Check if CCM is up and running
In a new Command Prompt window (login as yourself) type:
>ccm status
No currently active cluster (use ccm cluster switch)
Step 7: Update hosts file
Open Notepad as Administrator and the following lines to the C:\Windows\System32\drivers\etc\hosts file:
#cassandra nodes
127.0.0.1 127.0.0.2
127.0.0.1 127.0.0.3
127.0.0.1 127.0.0.4
127.0.0.1 127.0.0.5
127.0.0.1 127.0.0.6
Step 8: Create and populate a 3 node cluster using Cassandra v2.1.2
Note: This will download version 2.1.2 of Cassandra, build it and then use it to create a new CCM cluster called “mytestcluster”.
Cassandra installation path %USERPROFILE%.ccm\repository\2.1.2
“test” cluster path %USERPROFILE%.ccm\test
C:\Users\myusername>ccm create mytestcluster -v 2.1.2
Downloading http://archive.apache.org/dist/cassandra/2.1.2/apache-cassandra-2.1.2-bin.tar.gz to c:\users\myusername\appdata\local\temp\ccm-qwauvs.tar.gz (21.735MB)
22790390 [100.00%]
Extracting c:\users\myusername\appdata\local\temp\ccm-qwauvs.tar.gz as version 2.1.2 ...
Current cluster is now: mytestcluster
C:\Users\myusername>ccm status
Cluster: 'mytestcluster'
------------------------
No node in this cluster yet
C:\Users\myusername>ccm populate -n 3
C:\Users\myusername>ccm status
Cluster: 'mytestcluster'
------------------------
node1: DOWN (Not initialized)
node3: DOWN (Not initialized)
node2: DOWN (Not initialized)
C:\Users\myusername>ccm start
Started: node1 with pid: 17432
Started: node3 with pid: 6308
Started: node2 with pid: 22484
C:\Users\myusername>ccm status
Cluster: 'mytestcluster'
------------------------
node1: UP
node3: UP
node2: UP
C:\Users\myusername>ccm jconsole
C:\Users\myusername>ccm node1 show
node1: UP
cluster=mytestcluster
auto_bootstrap=False
thrift=('127.0.0.1', 9160)
binary=('127.0.0.1', 9042)
storage=('127.0.0.1', 7000)
jmx_port=7100
remote_debug_port=0
initial_token=-9223372036854775808
pid=17432
C:\Users\myusername>ccm node2 show
node2: UP
cluster=mytestcluster
auto_bootstrap=False
thrift=('127.0.0.2', 9160)
binary=('127.0.0.2', 9042)
storage=('127.0.0.2', 7000)
jmx_port=7200
remote_debug_port=0
initial_token=-3074457345618258603
pid=22484
C:\Users\myusername>ccm node3 show
node3: UP
cluster=mytestcluster
auto_bootstrap=False
thrift=('127.0.0.3', 9160)
binary=('127.0.0.3', 9042)
storage=('127.0.0.3', 7000)
jmx_port=7300
remote_debug_port=0
initial_token=3074457345618258602
pid=6308

Resources