We have a Xtradb cluster with three nodes. There is one node, which was not properly stopped and won't start. The other two nodes are correctly working and responding. The only thing in logs is this:
-- Unit mysql.service has begun starting up.
Aug 25 04:40:45 percona-prod-perconaxtradb-vm-0 /etc/init.d/mysql[2503]: MySQL PID not found, pid_file detected/guessed: /var/run/mysql
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 mysql[2462]: Starting MySQL (Percona XtraDB Cluster) database server: mysqld . . . . .
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 mysql[2462]: failed!
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 systemd[1]: mysql.service: control process exited, code=exited status=1
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 systemd[1]: Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daem
-- Subject: Unit mysql.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
In /var/lib/mysql/wsrep_recovery.qEEkjd we found this:
2018-08-25T05:49:31.055887Z 0 [ERROR] Found 20 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
2018-08-25T05:49:31.055892Z 0 [ERROR] Aborting
2018-08-25T05:49:31.055901Z 0 [Note] Binlog end
We would like to completely drop these 20 prepared transactions.
The other two nodes are consistent and working, so it would be enough to tell this node "ignore your state and sync with other nodes".
In the end we removed the /data folder on the dead node and restarted the node. The node then started SST replication - which takes a long time and the only progress one can see is checking the growing size of the folder. But then it worked.
Related
I just installed filebeat, logstash, kibana and elasticsearch all running smoothly just to trial this product out for additional monthly reports/monitoring and noticed every time I try to change the "/etc/elasticsearch/elasticsearch.yml" config file for remote web access it'll basically crash the service every time I make the change.
Just want to say I'm new to the forum and this product, and my end goal for this question is to figure out how to allow remote connections to access elastisearch as I guinea pig and test without crashing elasticsearch.
For reference here is the error code when I run the 'sudo systemctl status elasticsearch' query:
Dec 30 07:27:37 ubuntu systemd[1]: Starting Elasticsearch...
Dec 30 07:27:52 ubuntu systemd-entrypoint[4067]: ERROR: [1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
Dec 30 07:27:52 ubuntu systemd-entrypoint[4067]: bootstrap check failure [1] of [1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.se>
Dec 30 07:27:52 ubuntu systemd-entrypoint[4067]: ERROR: Elasticsearch did not exit normally - check the logs at /var/log/elasticsearch/elasticsearch.log
Dec 30 07:27:53 ubuntu systemd[1]: elasticsearch.service: Main process exited, code=exited, status=78/CONFIG
Dec 30 07:27:53 ubuntu systemd[1]: elasticsearch.service: Failed with result 'exit-code'.
Dec 30 07:27:53 ubuntu systemd[1]: Failed to start Elasticsearch.
Any help on this is greatly appreciated!
After I installed couchdb, I could get the welcome information
$ curl localhost:5984
{"couchdb":"Welcome","version":"2.1.2","features":["scheduler"],"vendor":{"name":"The Apache Software Foundation"}}
But I can't check the status by systemctl
$ systemctl status couchdb.service
● couchdb.service
Loaded: not-found (Reason: No such file or directory)
Active: failed (Result: start-limit-hit) since 一 2018-12-03 14:52:14 CST; 6min ago
Main PID: 30946 (code=killed, signal=USR2)
12月 03 14:52:14 gpuhuawei systemd[1]: couchdb.service: Unit entered failed state.
12月 03 14:52:14 gpuhuawei systemd[1]: couchdb.service: Failed with result 'signal'.
12月 03 14:52:14 gpuhuawei systemd[1]: couchdb.service: Service hold-off time over, scheduling restart.
12月 03 14:52:14 gpuhuawei systemd[1]: Stopped Apache CouchDB.
12月 03 14:52:14 gpuhuawei systemd[1]: couchdb.service: Start request repeated too quickly.
12月 03 14:52:14 gpuhuawei systemd[1]: Failed to start Apache CouchDB.
12月 03 14:52:14 gpuhuawei systemd[1]: couchdb.service: Unit entered failed state.
12月 03 14:52:14 gpuhuawei systemd[1]: couchdb.service: Failed with result 'start-limit-hit'.
12月 03 14:53:53 gpuhuawei systemd[1]: Stopped Apache CouchDB.
12月 03 14:53:53 gpuhuawei systemd[1]: Stopped Apache CouchDB.
When I run couchdb by command line, I got
$ couchdb
{"init terminating in do_boot",{{badmatch,{error,{bad_return,{{couch_app,start,[normal,["/etc/couchdb/default.ini","/etc/couchdb/local.ini"]]},{'EXIT',{{badmatch,{error,{error,eacces}}},[{couch_server_sup,start_server,1,[{file,"couch_server_sup.erl"},{line,56}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,273}]}]}}}}}},[{couch,start,0,[{file,"couch.erl"},{line,18}]},{init,start_it,1,[]},{init,start_em,1,[]}]}}
[1] 2288 user-defined signal 2 couchdb
My work enviroment
$ uname -a
Linux gpuhuawei 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
This is a bit late, but the "start-limit-hit" message is a red herring. I have seen something very similar with a Moodle installation using MySQL and it's actually saying that you (or the service start process) have tried to restart the database too many times or too soon after a failed attempt to start. Basically, this start-limit-hit message is saying "stop trying to do the same thing and expecting different results".
The actual issue will be further up in the syslog. Unhelpfully, service status does not return enough lines of the error messages to actually see what is wrong. Try a service start, and go and look in the actual syslog, and you will see the series of start attempts and a line just above each one hopefully will tell you the actual issue. In my case here, you can see that the problem is the mount point containing the database is missing - thanks, Azure. For one attempt of service start, it tries to start 5 times in quick succession, failing each time because the data dir was not mounted, and on the sixth it fails with the start-limit-hit.
Always back up your data/ and etc/ directories prior to upgrading CouchDB.
We recommend that you overwrite your etc/default.ini file with the version provided by the new release. New defaults sometimes contain mandatory changes to enable default functionality. Always places your customization in etc/local.ini or any etc/local.d/*.ini file.
(I was followed this and it worked)
https://docs.couchdb.org/en/3.0.0/install/upgrading.html
In my ubuntu machine when I run the command curl -X GET 'http://localhost:9200' to test connection it show following message.
curl: (7) Failed to connect to localhost port 9200: Connection refused
When i check server status with sudo systemctl start elasticsearch it show following message.
● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2016-11-20 16:32:30 BDT; 44s ago
Docs: http://www.elastic.co
Process: 8653 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet -Edefault.path.logs=${LOG_DIR} -Edefa
Process: 8649 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=0/SUCCESS)
Main PID: 8653 (code=exited, status=1/FAILURE)
Nov 20 16:32:29 bahar elasticsearch[8653]: 2016-11-20 16:32:25,579 main ERROR Null object returned for RollingFile in Appenders.
Nov 20 16:32:29 bahar elasticsearch[8653]: 2016-11-20 16:32:25,579 main ERROR Null object returned for RollingFile in Appenders.
Nov 20 16:32:29 bahar elasticsearch[8653]: 2016-11-20 16:32:25,580 main ERROR Unable to locate appender "rolling" for logger config "root"
Nov 20 16:32:29 bahar elasticsearch[8653]: 2016-11-20 16:32:25,580 main ERROR Unable to locate appender "index_indexing_slowlog_rolling" for logge
Nov 20 16:32:29 bahar elasticsearch[8653]: 2016-11-20 16:32:25,581 main ERROR Unable to locate appender "index_search_slowlog_rolling" for logger
Nov 20 16:32:29 bahar elasticsearch[8653]: 2016-11-20 16:32:25,581 main ERROR Unable to locate appender "deprecation_rolling" for logger config "o
Nov 20 16:32:29 bahar elasticsearch[8653]: [2016-11-20T16:32:25,592][WARN ][o.e.c.l.LogConfigurator ] ignoring unsupported logging configuration
Nov 20 16:32:30 bahar systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
Nov 20 16:32:30 bahar systemd[1]: elasticsearch.service: Unit entered failed state.
Nov 20 16:32:30 bahar systemd[1]: elasticsearch.service: Failed with result 'exit-code'.
This is the error for the PATH and LOgs in the elasticsearch.yml (etc/elasticsearch/elasticsearch.yml)
Uncheck these path and your error will be removed.
That means elasticsearch is not running. And from what I see, there is a problem with starting it. Check your elasticsearch configuration.
check if Elasticsearch is running,run the follwing command:
$ ps aux|grep elasticsearch
if Elasticsearch is not started,check your JAVA Environment,download a new Elasticsearch and install it again:
1.check if JAVA is correctly installed:
$ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
if your JAVA version is lower 1.7,change a new one.
2.download Elasticsearch install package,unzip it:
$ tar -zxvf elasticsearch-2.3.3.gz
3. run Elasticsearch
$ cd elasticsearch-2.3.3
$ ./bin/elasticsearch
Usually it's the write permission issue for the log directory (default as /var/log/elasticsearch), use ls -l to check the permission and change mode to 777 for the log directory and files if necessary.
Long story short: a system reboot might get it OK.
It has been a while since the question is asked. Anyway, I ran into a similar problem recently.
The elasticsearch service on one of my nodes died, with error saying similar to those posted in the question when restart the service. It says the log folder to write is read-only file system. But these files and directories are indeed owned by user elasticsearch (version 5.5, deployed on Cent OS 6.5), there should not be a read-only problem.
I checked and didn't find a clue. So, I just reboot the system. After rebooting, everything goes all right without any further tuning: elasticsearch service starts on boot as configured, it finds the cluster and all the other nodes, and the cluster health status turns green after a little while.
I guess, the root reason might be some hardware failure in my case. All data and logs managed by elasticsearch cluster are stored in a 2TB SSD driver mounted on each node. And our hardware team just managed to recover from an external storage failure recently. All the nodes restarted during that recovery. Chances are there are some lagged issues caused the problem.
It seems like sge tries start before lustre is mounted when the server boots, which brings an error to start automatically when it reboots.
Can somebody tell me how to change the order when it boots, so sge starts after lustre is mounted?
Error message from the log:
Aug 12 11:46:21 dragen1 systemd: Configuration file /usr/lib/systemd/system/sge_execd.service is marked executable. Please remove executable permission bits. Proceeding anyway.
Aug 12 11:46:40 dragen1 sge_execd: error: SGE_ROOT directory "/cm/shared/apps/sge/2011.11p1" doesn't exist
Aug 12 11:46:40 dragen1 systemd: sge_execd.service: control process exited, code=exited status=1
Aug 12 11:46:40 dragen1 systemd: Unit sge_execd.service entered failed state.
Aug 12 11:46:40 dragen1 systemd: sge_execd.service failed
I added in the following under [Unit] from the sge service
RequiresMountsFor=(Mount Point)
This fixed the problem.
I have created an RPM package for my daemon which on installation create service init script under /etc/init.d/
Now I wanted to port this RPM package for CentOS7.1 and use systemd framework for service startup through boot as well as by admin through start/stop command.
I couldn't find any tutorial. Please help.
EDIT:
I tried the suggestion given by msuchy.
I tried and here is my observation
On sysV based framework (CentOS6.5)
[root#adil work]# /etc/init.d/daemon_script status
Service daemon: Stopped
[root#adil work]# /etc/init.d/daemon_script start
Starting daemon: Initializing daemon... [ OK ]
[root#adil work]#
[root#adil work]# /etc/init.d/daemon_script status
Service daemon: Running
[root#adil work]#
[root#adil work]# /etc/init.d/daemon_script stop
Shutting down parent daemon: [ OK ]
[root#adil work]# /etc/init.d/daemon_script status
Service daemon: Stopped
[root#adil work]#
===========
On systemd based framework, I install the same RPM on CentOS7.1
[root#localhost x86_64]# /etc/init.d/daemon_script
Usage: /etc/init.d/daemon_script {start|stop|restart|status}
[root#localhost x86_64]# /etc/init.d/daemon_script start
Starting daemon_script (via systemctl): Warning: Unit file of daemon_script.service changed on disk, 'systemctl daemon-reload' recommended.
Job for daemon_script.service failed. See 'systemctl status daemon_script.service' and 'journalctl -xn' for details.
[FAILED]
[root#localhost x86_64]# systemctl daemon-reload
[root#localhost x86_64]# systemctl status daemon_script.service
daemon_script.service - SYSV: start and stop Test daemon service.
Loaded: loaded (/etc/rc.d/init.d/daemon_script)
Active: failed (Result: exit-code) since Fri 2015-09-11 15:30:44 IST; 32s ago
Sep 11 15:30:44 localhost.localdomain systemd[1]: Starting SYSV: start and st...
Sep 11 15:30:44 localhost.localdomain systemd[1]: daemon_script.service: cont...
Sep 11 15:30:44 localhost.localdomain systemd[1]: Failed to start SYSV: start...
Sep 11 15:30:44 localhost.localdomain systemd[1]: Unit daemon_script.service ...
Hint: Some lines were ellipsized, use -l to show in full.
[root#localhost x86_64]# systemctl status daemon_script.service -l
daemon_script.service - SYSV: start and stop Test daemon service.
Loaded: loaded (/etc/rc.d/init.d/daemon_script)
Active: failed (Result: exit-code) since Fri 2015-09-11 15:30:44 IST; 46s ago
Sep 11 15:30:44 localhost.localdomain systemd[1]: Starting SYSV: start and stop Test daemon service....
Sep 11 15:30:44 localhost.localdomain systemd[1]: daemon_script.service: control process exited, code=exited status=203
Sep 11 15:30:44 localhost.localdomain systemd[1]: Failed to start SYSV: start and stop Test daemon service..
Sep 11 15:30:44 localhost.localdomain systemd[1]: Unit daemon_script.service entered failed state.
[root#localhost x86_64]#
output of journalctl -xn
-- Logs begin at Fri 2015-09-11 14:50:35 IST, end at Fri 2015-09-11 15:40:01 IST. --
Sep 11 15:31:03 localhost.localdomain systemd[1]: [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop' in section 'Socket'
Sep 11 15:39:33 localhost.localdomain systemd[1]: Starting SYSV: start and stop Test daemon service....
-- Subject: Unit daemon_script.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit daemon_script.service has begun starting up.
Sep 11 15:39:33 localhost.localdomain systemd[8509]: Failed at step EXEC spawning /etc/rc.d/init.d/daemon_script: Exec format error
-- Subject: Process /etc/rc.d/init.d/daemon_script could not be executed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The process /etc/rc.d/init.d/daemon_script could not be executed and failed.
--
-- The error number returned while executing this process is 8.
Sep 11 15:39:33 localhost.localdomain systemd[1]: daemon_script.service: control process exited, code=exited status=203
Sep 11 15:39:33 localhost.localdomain systemd[1]: Failed to start SYSV: start and stop Test daemon service..
-- Subject: Unit daemon_script.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit daemon_script.service has failed.
--
-- The result is failed.
Sep 11 15:39:33 localhost.localdomain systemd[1]: Unit daemon_script.service entered failed state.
Sep 11 15:40:01 localhost.localdomain systemd[1]: Created slice user-0.slice.
-- Subject: Unit user-0.slice has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit user-0.slice has finished starting up.
--
-- The start-up result is done.
Sep 11 15:40:01 localhost.localdomain systemd[1]: Starting Session 7 of user root.
-- Subject: Unit session-7.scope has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit session-7.scope has begun starting up.
Sep 11 15:40:01 localhost.localdomain systemd[1]: Started Session 7 of user root.
-- Subject: Unit session-7.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit session-7.scope has finished starting up.
--
-- The start-up result is done.
Sep 11 15:40:01 localhost.localdomain CROND[8528]: (root) CMD (/usr/lib64/sa/sa1 1 1)
[root#localhost x86_64]#
You do not need to migrate your scripts.
It is probably better and you can utilize some interesting functions of Systemd, your files will be smaller. But you do not need to migrate. Systemd works correctly with SysV init files too.
You can not locate any SysV init files on CentOS 7 installation because Red Hat packagers (who created CentOS) put an effort into packaging and migrated all SysV files to unit files. But you do not need to.
The is only one task you need to do. Once you place new SysV file, you must reload the systemd manager configuration using
# systemctl daemon-reload
That is all.
I give you small example
# cat /etc/init.d/foo
#!/usr/bin/sh
echo ahoy
# chmod a+x /etc/init.d/foo
# systemctl start foo
Failed to start foo.service: Unit foo.service failed to load: No such file or directory.
# systemctl daemon-reload
# systemctl start foo
# journalctl -xn
-- Subject: Unit foo.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit foo.service has finished starting up.
--
-- The start-up result is done.
# service foo start
ahoy
So you can use all command (chkconfig, service) as you are used to from CentOS 6. And when you have time, you can study man systemd.unit(5) and bunch of other man pages (see "SEE ALSO" of that man page).