Invalid munge credential when creating SLURM multi-cluster setup - slurm

I am building a SLURM multi-cluster setup, with a slurmdbd hosted on-premises and a slurmctld node in Oracle Cloud. The slurmctld is able to connect to the slurmdbd, but receives this error message when I try to connect to the database in any way:
sacct: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to <IP_ADDRESS>: Failed to unpack SLURM_PERSIST_INIT message
sacct: error: slurmdbd: Sending PersistInit msg: No error
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
sacct: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to <IP_ADDRESS>: Failed to unpack SLURM_PERSIST_INIT message
sacct: error: slurmdbd: Sending PersistInit msg: No error
sacct: error: slurmdbd: DBD_GET_JOBS_COND failure: Unspecified error
Looking at the /var/log/slurm/slurmdbd.log file on my slurmdbd cluster, it records this error:
[2022-03-11T08:29:47.541] error: Munge decode failed: Invalid credential
[2022-03-11T08:29:47.541] auth/munge: _print_cred: ENCODED: Wed Dec 31 19:00:00 1969
[2022-03-11T08:29:47.541] auth/munge: _print_cred: DECODED: Wed Dec 31 19:00:00 1969
[2022-03-11T08:29:47.541] error: slurm_unpack_received_msg: auth_g_verify: REQUEST_PERSIST_INIT has authentication error: Unspecified error
[2022-03-11T08:29:47.541] error: slurm_unpack_received_msg: Protocol authentication error
[2022-03-11T08:29:47.551] error: CONN:10 Failed to unpack SLURM_PERSIST_INIT message
To ensure that my credentials are valid, I have copied the slurmdbd's MUNGE key to the slurmctld via SCP, ensured that the UID and GID of the slurm and munge users on all nodes are identical, and made sure that the clocks are all in sync. When I munge and unmunge on either server, it successfully decodes the encrypted message. However, when I try to authenticate the credential from one server to the other using the echo foo | ssh user#server munge | unmunge command, it gives me a response of unmunge: error: invalid credential. What could I be doing to still receive this response? What should I do to make sure that my credential is valid?

Related

How to fix the [ERROR] Helper - Failed to get registered user: Jim with error: TypeError: client.loadFromConfig is not a function

to solve the bug when i running hyperledger fabrics , balance
transfer by using the command in the bash script using the commands
./runApp.sh ./tesAPIs.sh
------------------------------------------------------------------------------ then getting the error: [2019-04-04 16:20:42.432] [ERROR] Helper -
Failed to get registered user: Jim with error: TypeError:
client.loadFromConfig is not a function [2019-04-04 16:20:42.432]
[ERROR] Helper - Failed to get registered user: Jim with error:
TypeError: client.loadFromConfig is not a function
---------------------------------------------------------------------------- tried with the hyperledger-fabric version 1.2.0 and
hyperledger-fabric version 1.4.0 , both times getting the same error
when i am running ./runApp.sh ./tesAPIs.sh when i am running
from hyperledger fabric from balance transfer. [ERROR] Helper -
Failed to get registered user: Jim with error: TypeError:
client.loadFromConfig is not a function this error should be
removed.
I've managed to fix this bug.
The thing is that in file helper.js in app folder, on the line 44 await is missing.
You can look at this line
let client = hfc.loadFromConfig(hfc.getConfigSetting('network'+config));
And when you log client you will see that there is a promise.
So you need just to add await before hfc.loadFromConfig and client will be a valid object with loadFromConfig function.

Puppet Agent Could not retrieve catalog

I installed Maven module in Master machine using this command:
puppet module install maestrodev-maven --version 1.4.0
It installed it successfully in /etc/puppet/modules/
Afterwards I added following code inside the file /etc/puppet/manifests/site.pp of master machine
node 'test02.edureka.com'
{
include maven
}
Now, when I run below command on Puppet Agent machine
puppet agent -t
It gives error:
root#test02:~# puppet agent -t
Warning: Unable to fetch my node definition, but the agent run will continue:
Warning: execution expired
Info: Retrieving pluginfacts
Error: /File[/var/lib/puppet/facts.d]: Failed to generate additional resources using 'eval_generate': execution expired
Error: /File[/var/lib/puppet/facts.d]: Could not evaluate: Could not retrieve file metadata for puppet://test01.edureka.com/pluginfacts: execution expired
Info: Retrieving plugin
Error: /File[/var/lib/puppet/lib]: Failed to generate additional resources using 'eval_generate': execution expired
Error: /File[/var/lib/puppet/lib]: Could not evaluate: Could not retrieve file metadata for puppet://test01.edureka.com/plugins: execution expired
Info: Loading facts
Error: JAVA_HOME is not defined correctly.
We cannot execute
Could not retrieve fact='maven_version', resolution='': undefined method `split' for nil:NilClass
Error: Could not retrieve catalog from remote server: execution expired
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Error: Could not send report: execution expired
root#test02:~#
puppet.conf file on master:
puppet.conf file on agent:
Error screenshot:

Chef-server-ctl reconfigure/ Creating Admin User on chef server

I am fairly new to Linux (and brand new to chef) and I have ran into an issue when setting up my chef server. I am trying to create an admin user with the command
sudo chef-server-ctl user-create admin Admin Ladmin admin#example.com
examplepass -f admin.pem
but after I keep getting this error:
ERROR: Connection refused connecting...
ERROR: Connection refused connecting to https://127.0.0.1/users/, retry 5/5
ERROR: Network Error: Connection refused - Connection refused
connecting to https://..., giving up
Check your knife configuration and network settings
I also noticed that when I ran chef-server-ctl I got this output:
[2016-12-21T13:24:59-05:00] ERROR: Running exception handlers Running
handlers complete
[2016-12-21T13:24:59-05:00] ERROR: Exception
handlers complete Chef Client failed. 0 resources updated in 01 seconds
[2016-12-21T13:24:59-05:00] FATAL: Stacktrace dumped to
/var/opt/opscode/local-mode-cache/chef-stacktrace.out
[2016-12-21T13:24:59-05:00] FATAL: Please provide the contents of the
stacktrace.out file if you file a bug report
[2016-12-21T13:24:59-05:00] FATAL:
Chef::Exceptions::CannotDetermineNodeName: Unable to determine node
name: configure node_name or configure the system's hostname and fqdn
I read that this error is due to a prerequisite mistake but I'm uncertain as to what it means or how to fix it. So any input would be greatly appreciated.
Your server does not have a valid FQDN (aka full host name). You'll have to fix this before installing Chef server.

Puppet-Master / Puppet-Agent Deployment fails due to the puppet modules metadata.json file permissions

I am using the Puppet Master / Puppet Agent deployment in CentOS 6.5 64bits.
My problem is about doing the request from Puppet Agent in order to start the puppet modules's installation. When I execute the following commandline from Puppet-Agent:
puppet agent --server <internal-puppet-server-hostname> --test
The ouput result is:
Info: Retrieving pluginfacts
Error: /File[/var/lib/puppet/facts.d]: Failed to generate additional resources using 'eval_generate': Error 400 on SERVER: Permission denied - /etc/puppet/modules/yum/metadata.json
Error: /File[/var/lib/puppet/facts.d]: Could not evaluate: Could not retrieve file metadata for puppet://<internal-puppet-server-hostname>/pluginfacts: Error 400 on SERVER: Permission denied - /etc/puppet/modules/yum/metadata.json
Info: Retrieving plugin
Error: /File[/var/lib/puppet/lib]: Failed to generate additional resources using 'eval_generate': Error 400 on SERVER: Permission denied - /etc/puppet/modules/yum/metadata.json
Error: /File[/var/lib/puppet/lib]: Could not evaluate: Could not retrieve file metadata for puppet://<internal-puppet-server-hostname>/plugins: Error 400 on SERVER: Permission denied - /etc/puppet/modules/yum/metadata.json
Info: Caching catalog for <internal-puppet-agent-hostname>
Info: Applying configuration version '1440623253'
About the returned response, I show that the problem occurs because of the permissions of metadata.json's file. Then, I've checked the permissions of metadata.json:
-r--------. 1 root root 539 2015-08-18 14:19 metadata.json
Some ideas about error occurs? Thanks.

Sending message via AMQP using qpid-proton-c fails with error SSL Failure: Unknown error

I am trying to get the sender example from the azure-service-bus-sample working on Linux based system (https://github.com/genieplus/azure-service-bus-samples/blob/master/proton-c-queues-and-topics/sender.c). However, when I try to execute it, it always fails with the following traces:
Sending messages to amqps://testRule:***#MY_DOMAIN/ServiceBusDefaultNamespace/q1
CALL pn_messenger_set_outgoing_window... RETURNED 0
CALL pn_messenger_set_blocking... RETURNED 0
CALL pn_messenger_start... RETURNED 0
CALL pn_messenger_put... RETURNED 0
CALL pn_messenger_send... recv: Connection reset by peer
[0x1449a80]:ERROR amqp:connection:framing-error SSL Failure: Unknown error.
CONNECTION ERROR connection aborted (remote)
RETURNED 0
I tried to add the auto generated certificate as a trusted certificate via
pn_messenger_set_trusted_certificates (messenger, "/ca/trusted/);
I got a different error message:
Sent BytesMessage with id
c99a2261-843e-4ebf-846c8fe799e8fc0b
Message status PN_STATUS_ABORTED
Final send status is: failed, never sent on network
CALL pn_messenger_settle... RETURNED 0
CALL pn_messenger_put... RETURNED 0
CALL pn_messenger_send... [0x18aaad0]:ERROR amqp:connection:framing-error SSL Failure: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
RETURNED 0
Any idea how to fix this?
You have to download the server certificate and copy to the path mentioned in the code. In linux you have to run c_rehash command to create hashing.
Ref link: https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_load_verify_locations.html

Resources