How to sync mirrors? - linux

I'm using Rabbitmq for a project, and there's a problem :(
It's a queue that I've been using.
rabbitmq_image
As you can see in the picture, master node (rabbit#HSDRABPAP01) is unsync at some point and mirror node (rabbit#HSDRABPAP03) becomes a new master.
However, the newly-uploaded node that was recovered (rabbit#HSDRABPAP01) is in an unsync state.
How can I sync rabbit#HSDRABPAP01?
I'd appreciate it if anybody could help me.
I've tried to sync rabbit#HSDRABPAP01 in 2 ways.
click Synchronise button in management UI
rabbitmqctl sync_queue <queue_name>
But neither works.
Still Synchronise button appears.
RabbitMQ log said it's already synced.
2023-01-26 22:48:25.586 [info] <0.6790.26> Mirrored queue 'M10QM-Q-LM-11' in vhost '/': Synchronising: 0 messages to synchronise
2023-01-26 22:48:25.586 [info] <0.6790.26> Mirrored queue 'M10QM-Q-LM-11' in vhost '/': Synchronising: batch size: 4096
2023-01-26 22:48:25.586 [info] <0.18067.1237> Mirrored queue 'M10QM-Q-LM-11' in vhost '/': Synchronising: all mirrors already synced

Related

Spark application in incomplete section of spark-history even when complited

In my Spark-history some applications are "incomplete" for a week now. I've tried to kill them, close sparkContext(), kill main .py process, but nothing helped.
For example,
yarn application -status <id>
shows:
...
State: FINISHED
Final-State: SUCCEDED
...
Log Aggregation Status: TIME_OUT
...
But in Spark-History I still see it in incomplete section of my applications. If I open this application there, I can see 1 Active job with 1 Alive executor, but they are doing nothing for all week. This seems like a logging bug, but as I know this problem is only with me, other coworkers don't have this problem.
This thread doesn't helped me, because I dont have access to start-history-server.sh.
I suppose this problem because of
Log Aggregation Status: TIME_OUT
because my "completed" applications have
Log Aggregation Status: SUCCEDED
What can I do to fix this? Right now I have 90+ incomplete applications.
I've found clear description of my problem with same situation (yarn, spark, etc.), but there is no solution: What is 'Active Jobs' in Spark History Server Spark UI Jobs section
From Spark Monitoring and Instrumentation:
...
3. Applications which exited without registering themselves as completed will be listed as incomplete --even though they are no
longer running. This can happen if an application crashes.
...
Meaning:
History Server's UI shows only those Spark applications whose event logs it can find in its spark.eventLog.dir directory (a config typically set to /user/spark/applicationHistory in Hadoop). If a log doesn't end with the special ApplicationEnd event
:
{"Event":"SparkListenerApplicationEnd","Timestamp":1667223930402}
...the application is considered incomplete (even if it is no longer running) and will be displayed on the Incomplete Applications page.
To your question it means that "moving" application to the Completed Apps page won't be trivial, and will require manually editing eventlog and re-uploading it to SHS directory in Hadoop. Moreover, it won't solve anything, since most likely, your application keeps crashing before it can write that final message, and its next run will end up on the same Incomplete page again.
To diagnose the reason why it fails, perhaps you can look at the application driver logs for any clues -- errors or exception messages. Graceful shutdown looks different depending on what kind of resource manager and what deploy mode your app is using. For deploy-mode=cluster and YARN RM, it would look something like this:
:
22/10/31 11:11:11 INFO spark.SparkContext: Successfully stopped SparkContext
22/10/31 11:11:11 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
22/10/31 11:11:11 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
22/10/31 11:11:11 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
22/10/31 11:11:11 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://.../.../.sparkStaging/application_<appId>
22/10/31 11:11:11 INFO util.ShutdownHookManager: Shutdown hook called
22/10/31 11:11:11 INFO util.ShutdownHookManager: Deleting directory /.../.../appcache/application_<appId>/spark-<guid>

Client stuck at 'Authenticating' phase

I've set up Azerothcore on a VPS following this tutorial. I've created an account but when I try to log in my client gets stuck on the authenticating phase.
I followed the tutorial completely except I had to download the data files from the recommended link in the AzerothCore Wiki, because my worldserver did not recognize the files provided in the tutorial.
I've checked the config files and database and everything seems ok. The address is what it should be (my VPS address) and ports seem to be ok, too. I've tried redownloading the client (WoWmane WotLK 3.3.5a client, with WoD models), checking my firewall (exceptions added for the WoW client) and checking the realmlist.wtf file and config file, to no avail. My folder is not read-only and I'm really lost now.
EDIT: I've now managed to get the 'Malformed package' error again. I started the auth server, then the world server, then tried to log in, then shut down both servers after the client got stuck again. I'll paste the relevant portion of the server log file:
2019-08-25 03:03:49 ERROR: WORLD: World initialized in 0 minutes 13 seconds
2019-08-25 03:03:49
2019-08-25 03:03:49 worldserver process priority class set to -15
2019-08-25 03:03:49 Max allowed socket connections 1024
2019-08-25 03:03:49 Starting up Auction House Listing thread...
2019-08-25 03:03:49 AzerothCore rev. 2f74802d03d5 2019-08-23 22:22:26 +0200 (master branch) (Unix, Release) (worldserver-daemon) ready...
2019-08-25 03:04:16 ERROR: WorldSocket::handle_input_header(): client (account: 0, char [GUID: 0, name: <none>]) sent malformed packet (size: 8, cmd: 1867972643)
2019-08-25 03:04:42 Auction House Listing thread exiting without problems.
2019-08-25 03:04:42 Halting process...

Kafka Zookeeper Security Authentication & Authorization(JAAS) Using SASL

Regarding Kafka-Zookeeper Security using DIGEST MD5 Authentication, I am trying to rotate/change credentials/password for both server(zookeeper) and client(kafka) jaas config file.
We have a 3 node cluster of 3 zookeepers and 3 kafka broker nodes with below jaas configuration file.
kafka.conf
org.apache.zookeeper.server.auth.DigestLoginModule required
username="super"
password="password";
};
zookeeper.conf
Server {
org.apache.zookeeper.server.auth.DigestLoginModule required
user_super="password";
};
To rotate we do a rolling restart of server(zookeeper) instances after updating the credential(password) and during the process of rolling restart after updating the same credential/password for super user for client(kafka instances) one at a time, we notice
[2019-06-15 17:17:38,929] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-06-15 17:17:38,929] INFO [ZooKeeperClient] Connected. (kafka.zookeeper.ZooKeeperClient)
these info level in server logs, which eventually results in unclean shutdown and restart of the broker which impacts the writes and reads for longer than expected. I have tried commenting requireClientAuthScheme=sasl in zookeeper zoo.cfg https://cwiki.apache.org/confluence/display/ZOOKEEPER/Client-Server+mutual+authentication to allow any clients authenticate to zookeeper but no success.
Also, alternative approach - tried to update the credential/password in jaas config file dynamically using sasl.jaas.config and do get the same exception documented in this jira (reference: https://issues.apache.org/jira/browse/KAFKA-8010).
can someone have any suggestions? Thanks in advance.

Openshift 3 App Deployment Failed: Took longer than 600 seconds to become ready

I have a problem with my openshift 3 setup, based on Node.js + MongoDB (Persistent) https://github.com/openshift/nodejs-ex.git
Latest App Deployment: nodejs-mongo-persistent-7: Failed
--> Scaling nodejs-mongo-persistent-7 to 1
--> Waiting up to 10m0s for pods in rc nodejs-mongo-persistent-7 to become ready
error: update acceptor rejected nodejs-mongo-persistent-7: pods for rc "nodejs-mongo-persistent-7" took longer than 600 seconds to become ready
Latest Build: Complete
Pushing image 172.30.254.23:5000/husk/nodejs-mongo-persistent:latest ...
Pushed 5/6 layers, 84% complete
Pushed 6/6 layers, 100% complete
Push successful
I have no idea how to debug this? Can you help please.
Check what went wrong in console: oc get events
Failed to pull image? Make sure you included a proper secret

Can't backup to S3 with OpsCenter 5.2.1

I upgraded OpsCenter from 5.1.3 to 5.2.0 (and then to 5.2.1). I had a scheduled backup to local server and an S3 location configured before the upgrade, which worked fine with OpsCenter 5.1.3. I made to no changes to the scheduled backup during or after the upgrade.
The day after the upgrade, the S3 backup failed. In opscenterd.log, I see these errors:
2015-09-28 17:00:00+0000 [local] INFO: Instructing agents to start backups at Mon, 28 Sep 2015 17:00:00 +0000
2015-09-28 17:00:00+0000 [local] INFO: Scheduled job 458459d6-d038-41b4-9094-7d450e4bac6f finished
2015-09-28 17:00:00+0000 [local] INFO: Snapshots started on all nodes
2015-09-28 17:00:08+0000 [] WARN: Marking request d960ad7b-2ccd-40a4-be7e-8351ac038c53 as failed: {'sstables': {u'solr_admin': {u'solr_resources': {'total_size': 155313, 'total_files': 12, 'done_files': 0, 'errors': [u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', u'{:type :opsagent.backups.destinations/destination-not-found, :message "Destination missing: 62f5a26abce7463bad9deb7380979c4a"}', shortened for brevity.
The S3 location no longer appears in OpsCenter when I edit the scheduled backup job. When I try to re-add the S3 location, using the same bucket and credentials as before, I get the following error:
Location validation error: Call to /local/backups/destination_validate timed out.
Also, I don't know if this is related, but for completeness, I see some of these errors in the opscenterd.log as well:
WARN: No http agent exists for definition file update. This is likely due to SSL import failure.
I get this behavior with either DataStax Enterprise 4.5.1 or 4.7.3.
I have been having the exact same problem since updating to OpsCenter 5.2.x and just was able to get it working properly.
I removed all the settings suggested in the previous answer and then created new buckets in us-west-1, us-west-2 and us-standard. After this I was able to successfully able to add all of those as destinations quickly and easily.
It appears to me that the problem is that OpsCenter may be trying to list the objects in the bucket that you configure initially, which in my case for the 2 existing ones we were using had 11TB and 19GB of data in them respectively.
This could explain why increasing the timeout for some worked and not others.
Hope this helps.
Try adding the remote_backup_region property to the cluster configuration file under the [agents] heading in "cluster-name".conf. Valid values are: us-standard, us-west-1, us-west-2, eu-west-1, ap-northeast-1, ap-southeast-1
Does that help?
The problem was resolved by a combination of 2 things.
Delete the entire contents of the existing S3 bucket (or create a new bucket as previously suggested by #kaveh-nowroozi).
Edit /etc/datastax-agent/datastax-agent-env.sh and increase the heap size to 512M as suggested by a DataStax engineer. The default was set at 128M and I kept doubling it until backups became successful.

Resources