Percona XtraDB cluster first start time wait

Percona XtraDB cluster first start time wait - percona

I am trying to start the cluster on three clean centos machines.
I tried to keep this post short, I am not attaching config files becouse I used this guide and the config files are according this:
https://www.percona.com/doc/percona-xtradb-cluster/5.7/add-node.html#add-node
Starting first node ok.
Starting second node error.
Here is the log on second node
2017-09-28T15:05:09.367856Z 0 [Note] WSREP: Initiating SST/IST transfer on JOINER side (wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.14.104' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '5490' '' )
2017-09-28T15:05:09.368984Z 0 [ERROR] WSREP: Failed to read 'ready ' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.14.104' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '5490' ''
Read: '(null)'
2017-09-28T15:05:09.369064Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.14.104' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '5490' '' : 2 (No such file or directory)
2017-09-28T15:05:09.370161Z 2 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable.
2017-09-28T15:05:09.370192Z 2 [ERROR] Aborting

Second node startup is failing because it is unable to perform and SST (full state transfer) from the donor node.
This is failing because the xtrabackup-v2 is failing. You need to check the logs on the donor node to get more idea as to why, but possible reasons include -
Insufficient memory on donor node
Syntax error in my.cnf on donor node (xtrabackup is more picky about syntax than normal mysql -- check for duplicate lines, which mysql accepts but xtrabackup doesnt)
File permissions
xtrabackup installed incorrectly, not installed, or wrong version
mismatch in wsrep configuration between nodes
invalid credentials for wsrep authentication

There are several reasons the SST could have failed. You need to examine the logs on the first node too. Could be ports blocked, could be no SST user created, wrong SST password, missing xtrabackup software, etc, etc. Impossible to tell from only what you provided.

Related

arangodb starter mode does not start

I have d/l'd arangodb3-linux-3.9.2 from GIT on Centos 7. I created a database dir and ran the README instructions for a standalone start. The first time it runs, I get 100 failures, the key INFO log lines seem to be
... [INFO] server started component=arangodb pid=49827 type=single
... [INFO] Wait on 49827 returned component=arangodb exit-status=1 trap-cause=-1
It creates the log file, setup.json and a single8529 dir in the database dir I sped'd. Is it just taking too long to start? The whole 100 fails take about 1 or 2 seconds.
If I try to run it again with the same README instructions, the next time I get this error:
... [FATAL] Failed to run service error="open /.../single8529/data/ENGINE: no such file"
I have also tried with --starter.host 127.0.0.1 -- to simplify
Also I and can confirm that port 8529 is open

I couldn't get arangodb 'starter' according to their README to work, but this does start the server:
arangod --database.directory MYDIR --rocksdb.max-background-jobs 4

Startup IAM Services failed

C:\domino-iam-service>npm start
> domino-iam-service#2.2.0 start
> cross-env NODE_ENV=production node iam-server.js
WARNING: NODE_ENV value of 'production' did not match any deployment config file names.
WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
[11:52:41][info][master][master]: IAM version: 2.2.0
Start to unlock config:
? Enter current IAM server password: ********
Config is unlocked.
[11:53:43][info][master][master]: Starts as cluster mode.
[11:53:43][info][stats][master]: IAM StatsClient enabled: false
[11:53:43][info][cluster][master]: Worker 1 is started
[11:53:43][info][cluster][master]: Worker 2 is started
WARNING: NODE_ENV value of 'production' did not match any deployment config file names.
WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
WARNING: NODE_ENV value of 'production' did not match any deployment config file names.
WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
[11:53:49][info][worker][worker-1]: Worker 1 starts to provide service, which process id is: 3752
[11:53:49][info][initServices][worker-1]: Start IAM service on allAddress:9443
[11:53:49][info][worker][worker-2]: Worker 2 starts to provide service, which process id is: 2772
[11:53:49][info][stats][worker-1]: IAM StatsClient enabled: false
[11:53:49][info][initServices][worker-2]: Start IAM service on allAddress:9443
[11:53:50][warn][DBConnector][worker-1]: dbConfig.dominoConfig.credential.CLIENT_KEY_PASSPHRASE setting is empty, it is NOT SECURE.
[11:53:50][info][stats][worker-2]: IAM StatsClient enabled: false
[11:53:50][warn][DBConnector][worker-1]: Please use openssl tool to add passphrase for your client key file.
[11:53:50][warn][DBConnector][worker-2]: dbConfig.dominoConfig.credential.CLIENT_KEY_PASSPHRASE setting is empty, it is NOT SECURE.
[11:53:50][warn][DBConnector][worker-2]: Please use openssl tool to add passphrase for your client key file.
[11:53:50][error][ClusterCache][worker-2]: Error occurred when constructing ClusterCache with error: timeout
[11:53:50][error][ClusterCache][worker-1]: Error occurred when constructing ClusterCache with error: timeout
[11:53:50][info][DBConnector][worker-2]: Domino isn't connected, retry after 30s
[11:53:50][info][DBConnector][worker-1]: Domino isn't connected, retry after 30s
The domino server with only one error message.
0554:0002-0594] 2022/07/14 下午 12:06:17 AMgr: Error executing agent 'DeleteExpiredDocs' in 'iam-store.nsf'. Agent signer 'Domino Template Development/Domino': You are not authorized to perform that operation

Unable to spin up 3 nodes via yb-master in yugabytedb

I am unable to start a 3 node universe with yb-master. I am following the docs here:
https://docs.yugabyte.com/latest/deploy/manual-deployment/start-masters/#verify-health
I created 3 master.conf files for 3 separate ips.
For 10.0.0.185:
--master_addresses=10.0.0.185:7100,10.0.0.141:7100,10.0.0.119:7100
--rpc_bind_addresses=10.0.0.185:7100
--fs_data_dirs=/home/mark/yuga/y1
For 10.0.0.141:
--master_addresses=10.0.0.141:7100,10.0.0.185:7100,10.0.0.119:7100
--rpc_bind_addresses=10.0.0.141:7100
--fs_data_dirs=/home/mark/yuga/y1
For 10.0.0.119:
--master_addresses=10.0.0.119:7100,10.0.0.141:7100,10.0.0.185:7100
--rpc_bind_addresses=10.0.0.119:7100
--fs_data_dirs=/home/mark/yuga/y1
I started each node up with the command ./bin/yb-master --flagfile master.conf >& ./y1/yb-master.out &
What seems to happen is that the first 2 nodes start up fine but as soon as I try to spin up the third node, the first node crashes and I end up with the error:
At first I thought may be this has to do with the servers I've got so I changed up the order I spin up the yb-masters, but it's always the first one I spin up first that dies.
Looking at the yb-master.INFO for each ip from yb1/yb-data/master/logs/yb-master.INFO with the command cat y1/yb-data/master/logs/yb-master.INFO | grep master I see:
The one that crashes:
This master's current role is: FOLLOWER
And the other two show:
I0110 00:02:56.565732 3292 client-internal.cc:2384] New master addresses: [10.0.0.141:7100,10.0.0.185:7100,10.0.0.119:7100, 10.0.0.141:7100, 10.0.0.185:7100, 10.0.0.119:7100]
E0110 00:02:58.069311 3162 async_initializer.cc:99] Failed to initialize client: Timed out (yb/rpc/rpc.cc:224): Could not locate the leader master: GetLeaderMasterRpc(addrs: [10.0.0.141:7100, 10.0.0.185:7100, 10.0.0.119:7100, 10.0.0.141:7100, 10.0.0.185:7100, 10.0.0.119:7100], num_attempts: 46) passed its deadline 1101.945s (passed: 1.504s): Network error (yb/util/net/socket.cc:551): recvmsg error: Connection refused (system error 111)
I0110 00:02:59.071501 3293 client-internal.cc:2355] Reinitialize master addresses from file: master.conf
I0110 00:02:59.071782 3293 client-internal.cc:2384] New master addresses: [10.0.0.141:7100,10.0.0.185:7100,10.0.0.119:7100, 10.0.0.141:7100, 10.0.0.185:7100, 10.0.0.119:7100]
and
I0110 00:02:57.610631 2128 master_service.cc:531] Patching role from leader to follower because of: Leader not ready to serve requests (yb/master/scoped_leader_shared_lock.cc:123): Leader not yet ready to serve requests: leader_ready_term_ = -1; cstate.current_term = 1 [suppressed 77 similar messages]
I0110 00:02:58.072002 2144 client-internal.cc:2355] Reinitialize master addresses from file: master.conf
I0110 00:02:58.072276 2144 client-internal.cc:2384] New master addresses: [10.0.0.119:7100,10.0.0.141:7100,10.0.0.185:7100, 10.0.0.119:7100, 10.0.0.141:7100, 10.0.0.185:7100]
I'm not sure why I'm seeing those errors, am I missing something while attempting to start up the 3 yb-masters?
I should also mention that I've ensured all 3 nodes have the correct system configurations, as mentioned here: https://docs.yugabyte.com/latest/deploy/manual-deployment/system-config/#setting-system-wide-ulimits

cassandra service (3.11.5) stops automaticall after it starts/restart on AWS linux

cassandra service (3.11.5) stops automatically after it starts/restart on AWS linux.
I have fresh installation of cassandra on new instance of AWS linux (t3.xlarge) and
sudo service cassandra start
or
sudo service cassandra restart
after 1 or 2 seconds, the service stop automatically. I looked into logs and I found these.
I am not sure, I havent change configs related to snitch and its always SimpleSnitch. I dont have any multiple cassandras. Just only on single EC2.
Logs
INFO [main] 2020-02-12 17:40:50,833 ColumnFamilyStore.java:426 - Initializing system.schema_aggregates
INFO [main] 2020-02-12 17:40:50,836 ViewManager.java:137 - Not submitting build tasks for views in keyspace system as storage service is not initialized
INFO [main] 2020-02-12 17:40:51,094 ApproximateTime.java:44 - Scheduling approximate time-check task with a precision of 10 milliseconds
ERROR [main] 2020-02-12 17:40:51,137 CassandraDaemon.java:759 - Cannot start node if snitch's data center (datacenter1) differs from previous data center (dc1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
Installation steps
sudo curl -OL https://www.apache.org/dist/cassandra/redhat/311x/cassandra-3.11.5-1.noarch.rpm
sudo rpm -i cassandra-3.11.5-1.noarch.rpm
sudo pip install cassandra-driver
export CQLSH_NO_BUNDLED=true
sudo chkconfig --levels 3 cassandra on

The issue is in your log file:
ERROR [main] 2020-02-12 17:40:51,137 CassandraDaemon.java:759 - Cannot start node if snitch's data center (datacenter1) differs from previous data center (dc1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
It seems that you started the cluster, stopped it and renamed the datacenter from dc1 to datacenter1.
In order to fix:
If no data is stored, delete the data directories
If data is stored, rename the datacenter back to dc1 in the config

I had the same problem , where cassandra service immediately stops after it was started.
in the cassandra configuration file located at /etc/cassandra/cassandra.yaml change the cluster_name to the previous one, like this:
...
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'dc1'
# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
...

Cassandra Cluster configuration

I am trying to configure two windows servers in my network as Cassandra cluster.
I did some reading in various sites and changed the below in Cassandra.yalm
after changing the default value of 127.0.0.1 to actual IP the Cassandra service is not starting.
I also added the map to actual IP to localhost in (windows) hosts file.
After doing the above change, the service is coming up when I start the service. it is stopping immediately.
The reason I am changing this IP is to make this a cluster with two node setup,
Please let me know if I miss some thing.
Version: Datastax community version of Cassandra
Server : windows.
Thx
Muthu
Message from Cassandra.txt in logs dir:
ERROR [main] 2014-09-18 11:43:12,155 DatabaseDescriptor.java (line 116) Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: Invalid yaml Caused by: Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=seed_provider for JavaBean=org.apache.cassandra.config.Config#34e5190a; No suitable constructor with 2 arguments found for class org.apache.cassandra.config.SeedProviderDef in 'reader', line 8, column 1: cluster_name: 'Test Cluster'

If you want to create Cassandra cluster you must have at least two nodes and configure /etc/cassandra/cassandra.yaml
cassandra.yaml
cluster_name: 'Some Cluster Name'
listen_address: [Current IP]
rpc_address: [Current IP]
seed_provuder:
- seeds: "[Current IP], [Remote IP]"
Note: seeds must have at least two IPs which must be reachable for each other
Clean and start Cassandra instance
sudo rm -rf /var/lib/cassandra/* /var/log/cassandra/*
Note: Cassandra instance must be killed before cleaning those folders.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Percona XtraDB cluster first start time wait - percona

There are several reasons the SST could have failed. You need to examine the logs on the first node too. Could be ports blocked, could be no SST user created, wrong SST password, missing xtrabackup software, etc, etc. Impossible to tell from only what you provided.

Related

arangodb starter mode does not start

Startup IAM Services failed

Unable to spin up 3 nodes via yb-master in yugabytedb

cassandra service (3.11.5) stops automaticall after it starts/restart on AWS linux

Cassandra Cluster configuration

Categories

Resources