Submitting first job to pacemaker - linux

I followed this guide:
https://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/
I stayed with the Active/Passive DRBD file system sharing. I had to reboot my cluster and now I am getting the following error:
Current DC: rbx-1 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Tue Nov 28 17:01:14 2017
Last change: Tue Nov 28 16:40:09 2017 by root via cibadmin on rbx-1
2 nodes configured
5 resources configured
Node rbx-2: UNCLEAN (offline)
Online: [ rbx-1 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started rbx-1
WebSite (ocf::heartbeat:apache): Stopped
Master/Slave Set: WebDataClone [WebData]
WebData (ocf::linbit:drbd): FAILED rbx-1 (blocked)
Stopped: [ rbx-2 ]
WebFS (ocf::heartbeat:Filesystem): Stopped
Failed Actions:
* WebData_stop_0 on rbx-1 'invalid parameter' (2): call=20, status=complete, exitreason='none',
last-rc-change='Tue Nov 28 16:27:58 2017', queued=0ms, exec=3ms
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
Any ideas?
Also does anyone have any recommended guides for submitting jobs?

This post is relatively old at this point but I'll leave this here for others to find if they stumble upon the same issue.
This problem has to do with an issue with the DRBD integration script that pacemaker uses. If it's broken, missing, has incorrect permissions, etc. you can get an error like this. In CentOS 7 that script is located at /usr/lib/ocf/resource.d/drbd
Note: This is specifically for the guide mentioned by OP but may help you:
Section 7.1 has a big "IMPORTANT" block that talks about replacing the Pacemaker integration script due to a bug. If you use the command it tells you to there, you actually replace the script with a 404 Error page which obviously doesn't work, causing the error. You can fix this issue by replacing the script with the original, either by reinstalling DRBD...
yum remove -y kmod-drbd84 drbd84-utils
yum install -y kmod-drbd84 drbd84-utils
...or finding just the drbd script elsewhere and adding/replacing it to /usr/lib/ocf/resource.d/drbd. Make sure its permissions are correct and that it is set as executable.
Hope that helps!

Related

OPC Publisher module does not start on my Ubuntu VM as an edge module

The OPC Publisher marketplace image runs successfully as a standalone container (albeit with server connection problems). But I am not able to deploy it as an edge module, especially after changing container create options.
Background: In my host laptop I was never able to get the module up so I created a Ubuntu VM. When I tried to deploy the edge module in the VM with default container create options the module did show up in the iotedge module list as "running". I wanted to set the "--op" option to set publishing rate so I changed it in the create options using the portal "Set modules" tab. Since there is no update button I used create button to "recreate" the modules. After this the module did not show up.
After that the OPC publisher module is not showing up on the edge VM. I am following the Microsoft tutorial.
Following is the command:
sudo docker run -v /iiotedge:/appdata mcr.microsoft.com/iotedge/opc-publisher:latest --aa --pf=/appdata/publishednodes.json --c="HostName=<iot hub name>.azure-devices.net;DeviceId=iothubowner;SharedAccessKey=<hub primary key>" --dc="HostName=<edge device id/name>.azure-devices.net;DeviceId=<edge device id/name>;SharedAccessKey=<edge primary key>" --op=10000
Container create options:
{
"Hostname": "opcpublisher",
"Cmd": [
"--pf=/appdata/publishednodes.json",
"--aa",
"--op=10000"
],
"HostConfig": {
"Binds": [
"/iiotedge:/appdata"
]
}
}
I have not specified the connection strings explicitly since the documentation from Microsoft assures that the runtime will pass them automatically.
The relevant iotedge journalctl logs are here.
Oct 06 19:36:05 shreesha-VirtualBox iotedged[9622]: 2021-10-06T14:06:05Z [INFO] - Pulling image mcr.microsoft.com/iotedge/opc-publisher:latest...
Oct 06 19:36:08 shreesha-VirtualBox iotedged[9622]: 2021-10-06T14:06:08Z [INFO] - Successfully pulled image mcr.microsoft.com/iotedge/opc-publisher:latest
Oct 06 19:36:08 shreesha-VirtualBox iotedged[9622]: 2021-10-06T14:06:08Z [INFO] - Creating module OPCPublisher...
Oct 06 19:36:08 shreesha-VirtualBox iotedged[9622]: 2021-10-06T14:06:08Z [INFO] - Starting new listener for module OPCPublisher
Oct 06 19:36:08 shreesha-VirtualBox iotedged[9622]: 2021-10-06T14:06:08Z [ERR!] - Internal server error: Could not create module OPCPublisher
Oct 06 19:36:08 shreesha-VirtualBox iotedged[9622]: caused by: Could not get module OPCPublisher
The logs from iotedge itself is not much useful. Find below anyway.
~$ iotedge logs OPCPublisher
A module runtime error occurred
I have also tried docker container prune just to be sure but it did not help.
Also strangely in the Azure portal when I try to restart the module from the troubleshoot page it throws an error "module not found in the current environment"
Can someone please help me out in troubleshooting this problem? I will be glad to share more details if required.
I raised a support query in Azure portal. After sending support bundles and trying various suggestions like removing DNS configuration, changing bind path to a non-sudo location etc. the team zeroed in on the edge version mismatch.
After re-reading the documentation I uninstalled the earlier iotedge package and installed aziot-edge instead and problem solved!
The team has raised a github issue for public tracking here:
https://github.com/Azure/Industrial-IoT/issues/1425
#asergaz also pointed to the right direction but did not notice since it came a bit later

Unable to load redis module, RedisJSON

After following the instructions specified here to compile the source code of RedisJson, got the rejson.so file at project_root/target/release, then I entered this command sudo redis-server --loadmodule /home/username/RedisJSON/target/release/rejson.so to load redis module. But I got this error message.
Server initialized
7666:M 14 Sep 2021 13:27:38.795 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
7666:M 14 Sep 2021 13:27:38.795 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
7666:M 14 Sep 2021 13:27:38.862 * <ReJSON> Exported RedisJSON_V1 API
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/redis-module-0.23.0/src/raw.rs:580:42
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
Aborted
How can I get this fixed, please?
RedisJSON requires Redis 6+, it seems like you're running on an older version of Redis.

RIAK Node does not Start after changing IP

I am in the process of setting up a Riak Cluster on Raspberry Pis.
Unfortunately I get the following error message after changing the IP address.
Versions I used:
Debian Jessie (Raspberry PI)
riak (Github Clone Mar2017)
riak-cs2.1.1
stanchion-2.1.1
Using this guide I tried to change the IP addresses in the various .conf files.
https://docs.riak.com/riak/kv/latest/using/cluster-operations/changing-cluster-info/index.html
Works on 127.0.0.1:
$ ~/riak/rel/riak/bin/riak-admin test
Successfully completed 1 read/write cycle to 'riak#127.0.0.1'
Error Message (after changing IP:192.168.178.61):
sudo ./riak console
config is OK
-config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args
Exec: /home/pi/neu/riak/rel/riak/bin/../erts-5.10.3/bin/erlexec -boot /home/pi/neu/riak/rel/riak/bin/../releases/2.2.3/riak -config /home/pi/neu/riak/rel/riak/data/generated.configs/app.2020.01.02.23.37.52.config -args_file /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -vm_args /home/pi/neu/riak/rel/riak/data/generated.configs/vm.2020.01.02.23.37.52.args -pa /home/pi/neu/riak/rel/riak/bin/../lib/basho-patches -- console
Root: /home/pi/neu/riak/rel/riak/bin/..
Erlang R16B02_basho10 (erts-5.10.3) [source] [smp:4:4] [async-threads:64] [hipe] [kernel-poll:true] [frame-pointer]
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_capability,renegotiate_capabilities,1,[{file,\"src/riak_core_capability.erl\"},{line,441}]},{riak_core_capability,handle_call,3,[{file,\"src/riak_core_capability.erl\"},{line,213}]},{gen_server,handle_msg,5,[{file,\"gen_server.erl\"},{line,585}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]},{gen_server,call,[riak_core_capability,{register,{riak_core,vnode_routing},{capability,[proxy,legacy],legacy,{riak_core,legacy_vnode_routing,[{true,legacy},{false,proxy}]}}},infinity]}}}}}}"}
Crash dump was written to: ./log/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#192.168.178.61',[
https://github.com/basho/riak/issues/999
martinsumner commented 3 days ago:
I might expect to see this if you hadn't done the step of either renaming (or deleting the contents of) the ring directory. Did you do this?
Also can you confirm if you're in the single-node or multi-node renaming scenario?
Ei3rb0mb3r commented 1 minute ago:
Many thanks for the quick feedback!
The error has been solved after I deleted the ring directory files.
../riak/rel/riak/data/ring/ rm -rf *

gitlab couln'd find remote ref

Gitlab updated from 8.17 to 9.0. The gitlab last version added the current user in front of the URL like http://user.name#gitlab_server/test.git. When I pull from IDE (like Android Studio or Eclipse) appears a error message:
fatal: couldn't find remote ref
If I read the logs file I found this error:
Started GET "/android/test.git/HEAD" for (ip) at 2017-03-27
06:30:08 +0000 Processing by ApplicationController#route_not_found as
/ Parameters: {"unmatched_route"=>"android/test.git/HEAD"} Completed 401 Unauthorized in 1ms (ActiveRecord: 0.3ms)
The problem is maybe correlated with Daylight saving time (DST) change?
What is the problem? How can I solve this issue?
Thanks so much.
I found the answer here
After update to Girlab 9.0 you should configure the sistem with these commands:
gitlab-ctl reconfigure
followed by a
gitlab-ctl restart
this solve my problem.

(bdutil) Unable to get hadoop/spark cluster working with a fresh install

I'm setting up a tiny cluster in GCE to play around with it but although instances are created some failures prevent to get it working. I'm following the steps in https://cloud.google.com/hadoop/downloads
So far I'm using (as of now) lastest versions of gcloud (143.0.0) and bdutil (1.3.5), freshly installed.
./bdutil deploy -e extensions/spark/spark_env.sh
using debian-8 as image (as bdutil still uses debian-7-backports).
At some point I got
Fri Feb 10 16:19:34 CET 2017: Command failed: wait ${SUBPROC} on line 326.
Fri Feb 10 16:19:34 CET 2017: Exit code of failed command: 1
full debug output is in https://gist.github.com/jlorper/4299a816fc0b140575ed70fe0da1f272
(project id and bucket names changed)
Instances are created, but spark not even installed. Digging a bit I've managed to run spark installation and start hadoop commands in the master after after ssh. But it fails badly when starting the spark-shell:
17/02/10 15:53:20 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.5-hadoop1
17/02/10 15:53:20 INFO gcsio.FileSystemBackedDirectoryListCache: Creating '/hadoop_gcs_connector_metadata_cache' with createDirectories()...
java.lang.RuntimeException: java.lang.RuntimeException: java.nio.file.AccessDeniedException: /hadoop_gcs_connector_metadata_cache
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
and not able to import sparkSQL. For what I've read everything should be started automatically.
Up to this point I'm a bit lost and don't know what else to do.
Am I missing any step? Is any of the commands faulty? Thanks in advance.
Update: solved
As pointed out in accepted solution I cloned the repo and cluster was created without issues. When trying to start the spark-shell though it gave
java.lang.RuntimeException: java.io.IOException: GoogleHadoopFileSystem has been closed or not initialized.`
That sounded to me like connectors were not initialized properly, so after running
./bdutil --env_var_files extensions/spark/spark_env.sh,bigquery_env.sh run_command_group install_connectors
it worked as expected.
The last version of bdutil on https://cloud.google.com/hadoop/downloads is a bit stale and I'd instead recommend using the version of bdutil at head on github: https://github.com/GoogleCloudPlatform/bdutil.

Resources