OpenMPI: ORTE was unable to reliably start one or more daemons

OpenMPI: ORTE was unable to reliably start one or more daemons - linux

I've been at it for days but could not solve my problem.
I am running:
mpiexec -hostfile ~/machines -nolocal -pernode mkdir -p $dstpath where $dstpath points to current directory and "machines" is a file containing:
node01
node02
node03
node04
This is the error output:
Failed to parse XML input with the minimalistic parser. If it was not
generated by hwloc, try enabling full XML support with libxml2.
[node01:06177] [[6421,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 891
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
[node01:06177] 1 more process has sent help message help-errmgr-base.txt / failed-daemon-launch
[node01:06177] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Failed to parse XML input with the minimalistic parser. If it was not
generated by hwloc, try enabling full XML support with libxml2.
[node01:06181] [[6417,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 891
I have 4 machines, node01 to node04. In order to log into these 4 nodes, I have to first log in to node00. I am trying to run some distributed graph functions. The graph software is installed in node01 and is supposed to be synchronised to the other nodes using mpiexec.
What I've done:
Made sure all passwordless login are setup, every machine can ssh to any other machine with no issues.
Have a hostfile in the home directory.
echo $PATH gives /home/myhome/bin:/home/myhome/.local/bin:/usr/include/openmpi:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
echo $LD_LIBRARY_PATH gives
/usr/lib/openmpi/lib
This has previously worked before, but it just suddenly started giving these errors. I got my administrator to install fresh machines but it still gave such errors. I've tried doing it one node at a time but it gave the same errors. I'm not entirely familiar with command line at all so please give me some suggestions. I've tried reinstalling OpenMPI from source and from sudo apt-get install openmpi-bin. I'm on Ubuntu 16.04 LTS.

You should focus on fixing:
Failed to parse XML input with the minimalistic parser. If it was not
generated by hwloc, try enabling full XML support with libxml2.
[node01:06177] [[6421,0],0] ORTE_ERROR_LOG: Error in file base/plm_base_launch_support.c at line 891

Related

Opensips-cli -x command not working in opensips 3.3

Recently I am working on upgrading my opensips version manually from 2.2 to 3.3.
Upgradation is done from my side but in old opensips(2.2) I was able to show registered user(SIP) using opensipsctl ul show command but in new version 3.3 opensipsctl is deprecated(I guess not sure).
So I am trying to get details using opensips-cli but I didn't find out correct command for show register and show dump list, I try to follow below link but did not find correct command.
https://www.opensips.org/Documentation/Interface-CoreMI-3-0
Also, my opensips-cli -x command not working giving the below error. (mi_fifo module loaded correctly)
# opensips-cli -o output_type=yaml -x mi uptime
ERROR: cannot access fifo file /tmp/opensips_fifo: [Errno 13] Permission denied: '/tmp/opensips_fifo'
ERROR: starting with Linux kernel 4.19, processes can no longer read from FIFO files
ERROR: that are saved in directories with sticky bits (such as /tmp)
ERROR: and are not owned by the same user the process runs with.
ERROR: To fix this, either store the file in a non-sticky bit directory (such as /var/run/opensips),
ERROR: or disable fifo file protection using 'sysctl fs.protected_fifos=0' (NOT RECOMMENDED)
/tmp/opensips_fifo file also created correctly.
# ls -l /tmp/opensips_fifo
prw-rw-rw- 1 opensips opensips 0 Dec 29 06:52 /tmp/opensips_fifo
Using opensips-cli command I am able to create database and add table but not able to perform -x command.
Can anyone help me to find out a command for show register and show dump list also any suggestion related -x command not working on opensips-cli.

I had a similar error and i found the following:
if you state in the opensips-cli.cfg file that the fifo_file is located at /tmp/opensips_fifo, it will produce this error, try changing this setting to /var/run/opensips/opensips_fifo

django.db.utils.DatabaseError: Error while trying to retrieve text for error ORA-01804

Q1. What versions are we using?
Ans.
Python 3.6.12
OS : CentOS 7 64-bit
DB : Oracle 18c
Django 2.2
cx_Oracle : 8.1.0
Q2. Describe the problem
Ans. While running server with "python3 manage.py runserver"
application is able to contact Oracle DB and show the Django Administration page and login also works.
But when we access the application using the Apache (HTTPD) based URL over secure SSL port, we do see the Django page and the admin page as well but Login to Admin page with Internal server error.
In the logs, we see
"django.db.utils.DatabaseError: Error while trying to retrieve text for error ORA-01804"
cx_oracle is otherwise able to connect to the database properly, another application is also using the same database behind the same httpd proxy and works fine
Q3. Show the directory listing where your Oracle Client libraries are installed (e.g. the Instant Client directory). Is it 64-bit or 32-bit?
Ans. 64-bit
Q4. Show what the PATH environment variable (on Windows) or LD_LIBRARY_PATH (on Linux) is set to?
LD_LIBRARY_PATH=/srv/vol/db/oracle/product/18.0.0/dbhome_1/lib:/lib:/usr/lib
PATH=$ORACLE_HOME/bin:/srv/vol/db/oracle/product/18.0.0/dbhome_1/lib:$PATH
Q5. Show any Oracle environment variables set (e.g. ORACLE_HOME, ORACLE_BASE).
ORACLE_HOME=/srv/vol/db/oracle/product/18.0.0/dbhome_1
TNS_ADMIN=$ORACLE_HOME/network/admin
NLS_LANG=AMERICAN_AMERICA.AL32UTF8
ORACLE_BASE=/srv/vol/db/oracle
CLASSPATH=$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib:$ORACLE_HOME/lib
Any suggestions/help is highly appreciated.
Thank you

I found the problem
So I just removed all the variable declarations from /etc/sysconfig/httpd and checked, the application was still able to access the lib files, so these were now redundant.
Then undid all variable declarations done earlier in .localsh and .localrc files for the os users. To start from scratch, and go step by step to see where it breaks.
So now, cx_Oracle was looking for the lib files in wrong directory
$ORACLE_HOME/client_1/lib
instead of
$ORACLE_HOME/lib
DPI-1047: Cannot locate a 64-bit Oracle Client library: "$ORACLE_HOME/client_1/lib/libclntsh.so: cannot open shared object file: No such file or directory". See https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html for help
I did not have any subfolder named "client_1" inside dbhome_1
so I just created a symlink client_1 that points to dbhome_1 (still unsure on this, but at least it works :) )
So, now, this error was gone but now again ORA-01804 was coming. 😑
I had read somewhere that this error can be fixed by adding "libociei.so" but I did not have one on my instance, so I generated it using these commands:-
mkdir -p $ORACLE_HOME/rdbms/install/instantclient/light
cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk igenliboci
Then I just moved this libociei.so file from
$ORACLE_HOME/instantclient to $ORACLE_HOME/lib
Now there was a new error (so.. progress 😉 ):
ORA-12546 - TNS Permission Denied.
This was easy to solve 😀
I used this command to address this :-
setsebool -P httpd_can_network_connect on
And...... That was all! It worked.

Error opening zip file or JAR manifest missing : jrebel.jar

When configuring JRebel on my remote server (JBoss on linux) I have configured the JVM arg as
-javaagent:/home/user/jrebel.jar" -Drebel.remoting_plugin=true
The jrebel.jar is absolutely definitely in that location, yet the server fails to start with the error:
Error opening zip
file or JAR manifest missing : /home/user/jrebel.jar Error occurred
during initialization of VM agent library failed to init: instrument
So the arg is oviously being passed to the JVM correctly, but for the life of me I can't work out why it can't find the jar. I've been through every Zero Turnaround article I can find + looked at the solutions that have resolved it for other people, but no luck. Any ideas?

Turned out to be a permissions problem - the JBoss user didn't have the permissions to access the directory that I had placed jrebel.jar into.
Would have been nice to have a more meaningfull error - e.g. 'permissions denied'. Shows my lack of Linux knowledge though I guess.
After the jar was moved to a directory within the JBoss installation + the jar owner was changed to the JBoss user and Read/Write/Execute permissions added, all is well.

Yes , the permission is the reason that this error happens to me when I tried to open PHPSTORM and that error was :
Error opening zip file or JAR manifest missing : ${JetbrainsIdesCrackPath}
Error occurred during initialization of VM
agent library failed to init: instrument
so before running PHPSTORM I had to run the command : sudo -i to get the root permission to run the program.

Error on neo4j server start on arch linux

I have an arch linux setup and installed neo4j through the arch user repository (yaourt -S neo4j), and I'm able to run the web console fine (sudo neo4j console with seemingly normal output and full functionality), however when trying to start the server (sudo neo4j start), I encounter the following error message:
/usr/share/neo4j/bin/utils: line 345: [: -lt: unary operator expected
Using additional JVM arguments: -server -XX:+DisableExplicitGC -Dorg.neo4j.server.properties=/etc/neo4j/neo4j-server.properties -Djava.util.logging.config.file=/etc/neo4j/logging.properties -Dlog4j.configuration=file:/etc/neo4j/log4j.properties -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled
Starting Neo4j Server...cat: /run/neo4j/neo4j-service.pid: No such file or directory
process []... waiting for server to be ready. Failed to start within 120 seconds.
Neo4j Server may have failed to start, please check the logs.
rm: cannot remove ‘/run/neo4j/neo4j-service.pid’: No such file or directory
There's no delay before the error message is printed, so it seems to be something other than the timeout. I'm quite new to neo4j (I worked through a fair bit of the user manual using the web console, but no development or server config experience), so I'm not really sure what else might be relevant. I tried looking through the utils script and the error appears to be where it attempts to su neo4j, but it also seems to proceed to attempt to start the server. I also tried changing the port it's starting on as in this question, but no change. The only log I can find just has this over and over (with appropriate timestamps):
Oct 15, 2014 1:33:49 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Any help at all would be appreciated!
EDIT:
The line 345 that it's failing on is the end of this snippet:
if [ $UID == 0 ] ; then
OPEN_FILES=`su $NEO4J_USER -c "ulimit -n"`
else
OPEN_FILES=`ulimit -n`
fi
if [ $OPEN_FILES -lt 40000 ]; then
From doing some echo debugging, it seems that su $NEO4J_USER is failing, probably because $NEO4J_USER is set to neo4j, a user that does not exist on my system. I tried setting that to root in one of the config files, but evidently that's not working properly. Arch is a continual learning experience for me, but I've not had to add a new user before to get software working.

The interesting line here is:
/usr/share/neo4j/bin/utils: line 345: [: -lt: unary operator expected
I assume that is caused by a wrong default shell for the neo4j user. What default is currently set for the neo4j system user? Try to switch that to bash. The startup scripts should work nicely with bash.

Problems with EXEC pplcd from PeopleSoft Application Engine

On a Unix server, I am running an application engine via the process scheduler.
In it, I am attempting to use a "zip" Unix command from within an "Exec" pplcode function.
However, I only get the error
PS_Exec(P): Error executing batch command with reason: No such file or directory (2)
I have tried it several ways. The most logical approach I thought was to change directory back to the root, then change to the specified directory so that I could easily use the zip command, such as the following...
Exec("cd / && cd /opt/psfin/pt850/dat/PSFIN1/PYMNT && zip INVREND INVREND.XML");
1643 12.20.34 0.000048 72: Exec("cd /opt/psfin/pt850/dat/PSFIN1/PYMNT");
1644 12.20.34 0.001343 PS_Exec(P): Error executing batch command with reason: No such file or directory (2)
I've even tried the following....just to see if anything works from within an Exec...
Exec("ls");
Sure enough, it gave the same error.
Now, some of you may be wondering, does the account that is associated with the process scheduler actually have authority on this particular directory path on the server ? Well, I was able to create the xml file given in the previous command with no problems.
I just cannot seem to be able to modify it with the Exec issuance of Unix commands.
I'm wondering if this is an error of rights and permissions from the unix server with regards to the operator id that the process scheduler is running from. However, given that it can create and write to a file there, I cannot understand why the Exec command would be met with any resistance....Just my gut shot in the dark...
Any help would be GREATLY appreciated!!!
Thanks,
Flynn

Not sure if you're still having an issue, but in your Exec code, adding the optional %FilePath_Absolute constant should help. When that constant is left off, PS automatically prefixes all commands with <PS_HOME>. You'll have to specify absolute paths with this flag on though. I've changed the command to something that should work.
Exec("zip /opt/psfin/pt850/dat/PSFIN1/PYMNT/INVREND /opt/psfin/pt850/dat/PSFIN1/PYMNT/INVREND.XML", %FilePath_Absolute);
The documentation at PeopleBooks is a little confusing sometimes, but it explains it fairly well in this case.
You can always store the absolute location in a variable and prefix that to your commands so you don't have to keep typing out /opt/psfin/pt850/dat/PSFIN1/PYMNT/.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string