Arangodb stops and won't restart after dev-xvdb times out - arangodb

I have arangodb 3.1.16 installed on an AWS C4 Instance. I have a Foxx Service trying to run in production. It is getting an average of 10 packets of 200 octets per second, and returning a flow of 20 packets of 200 octets per second.
Each time I start running my process, the foxx service runs with consistent performance for an hour and then suddenly stops. I do not have access to my foxx api anymore : all requests get connection timeout errors, and do not print on the foxx logs. I do not have access to the web interface anymore : the page just doesn’t load.
After a minute or so, the foxx logs show me an error message : 'ArangoError 18: lock timeout’
After an other minute the logs show me requests that are usually fast but took a very long time (WARNING {queries} slow query: took: 1770.862498)
Using "journalctl -xe", I learned that after a foreign IP tried to connect, I got = "Job dev-xvdb.device/start timed out"
I managed to restart arango using :
ps -eaf |grep arangod
sudo kill #
sudo apt-get --reinstall install arangodb3=3.1.16
How can I solve this recurring issue ?
"journalctl -xe" gives me :
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Failed with result 'exit-code’.
-- Subject: Unit arangodb3.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit arangodb3.service has begun starting up.
Apr 04 15:03:10 my-ip arangodb3[11481]: * Starting arango database server arangod
Apr 04 15:03:10 my-ip arangodb3[11481]: * database version check failed, maybe you need to run 'upgrade'?
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Control process exited, code=exited status=1
Apr 04 15:03:10 my-ip systemd[1]: Failed to start LSB: arangodb.
-- Subject: Unit arangodb3.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit arangodb3.service has failed.
--
-- The result is failed.
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Unit entered failed state.
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Failed with result 'exit-code'.
Apr 04 15:03:10 my-ip sudo[11346]: pam_unix(sudo:session): session closed for user root
Apr 04 15:03:17 my-ip sshd[11502]: Did not receive identification string from UNKNOWN IP 1
Apr 04 15:03:21 my-ip sshd[11503]: Connection closed by UNKNOWN IP 2 port 54736 [preauth]
Apr 04 15:03:21 my-ip sshd[11507]: Did not receive identification string from UNKNOWN IP 2
Apr 04 15:03:21 my-ip sshd[11506]: fatal: Unable to negotiate with UNKNOWN IP 2 port 54730: no matching host key type found. Their offer: ssh-dss [preauth]
Apr 04 15:03:21 my-ip sshd[11504]: Connection closed by UNKNOWN IP 2 port 54732 [preauth]
Apr 04 15:03:22 my-ip sshd[11505]: Connection closed by UNKNOWN IP 2 port 54734 [preauth]
Apr 04 15:03:40 my-ip systemd[1]: dev-xvdb.device: Job dev-xvdb.device/start timed out.
Apr 04 15:03:40 my-ip systemd[1]: Timed out waiting for device dev-xvdb.device.
-- Subject: Unit dev-xvdb.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit dev-xvdb.device has failed.
--
-- The result is timeout.
Apr 04 15:03:40 my-ip systemd[1]: Dependency failed for File System Check on /dev/xvdb.
-- Subject: Unit systemd-fsck#dev-xvdb.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-fsck#dev-xvdb.service has failed.
--
-- The result is dependency.
Apr 04 15:03:40 my-ip systemd[1]: Dependency failed for /mnt.
-- Subject: Unit mnt.mount has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mnt.mount has failed.
--
-- The result is dependency.
Apr 04 15:03:40 my-ip systemd[1]: mnt.mount: Job mnt.mount/start failed with result 'dependency'.
Apr 04 15:03:40 my-ip systemd[1]: systemd-fsck#dev-xvdb.service: Job systemd-fsck#dev-xvdb.service/start failed with result 'dependency'.
Apr 04 15:03:40 my-ip systemd[1]: dev-xvdb.device: Job dev-xvdb.device/start failed with result 'timeout'.
I tried :
sudo curl --dump - -X GET http://127.0.0.1:8529/_api/version && echo
It gives me :
HTTP/1.1 401 Unauthorized
Www-Authenticate: Bearer token_type="JWT", realm="ArangoDB"
Server: ArangoDB
Connection: Keep-Alive
Content-Type: text/plain; charset=utf-8
Content-Length: 0
I tried :
ps auxw | fgrep arangod
It gives me :
root 10439 0.0 0.1 82772 8664 ? Ss 10:09 0:00 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor
arangodb 10440 5.7 94.5 12901776 7242340 ? Sl 10:09 16:36 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor
ubuntu 11339 0.0 0.0 12916 1000 pts/0 R+ 14:59 0:00 grep -F --color=auto arangod
arangod restart gives me :
2017-04-04T15:01:16Z [11344] INFO ArangoDB 3.1.16 [linux] 64bit, using VPack 0.1.30, ICU 54.1, V8 5.0.71.39, OpenSSL 1.0.2g 1 Mar 2016
2017-04-04T15:01:16Z [11344] INFO using SSL options: SSL_OP_CIPHER_SERVER_PREFERENCE, SSL_OP_TLS_ROLLBACK_BUG
2017-04-04T15:01:16Z [11344] FATAL could not open shutdown file '/var/log/arangodb3/restart/SHUTDOWN': internal error
'service arangodb3 restart’ gives me (after a short wait time) :
Job for arangodb3.service failed because the control process exited with error code. See "systemctl status arangodb3.service" and "journalctl -xe" for details.
'systemctl status arangodb3.service' gives me :
arangodb3.service - LSB: arangodb
Loaded: loaded (/etc/init.d/arangodb3; bad; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2017-04-04 15:03:10 UTC; 34s ago
Docs: man:systemd-sysv-generator(8)
Process: 11352 ExecStop=/etc/init.d/arangodb3 stop (code=exited, status=0/SUCCESS)
Process: 11481 ExecStart=/etc/init.d/arangodb3 start (code=exited, status=1/FAILURE)
Tasks: 83
Memory: 6.5G
CPU: 73ms
CGroup: /system.slice/arangodb3.service
├─10439 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor
└─10440 /usr/sbin/arangod --uid arangodb --gid arangodb --pid-file /var/run/arangodb/arangod.pid --temp.path /var/tmp/arangod --log.foreground-tty false --supervisor
Apr 04 15:03:10 my-ip systemd[1]: Starting LSB: arangodb...
Apr 04 15:03:10 my-ip arangodb3[11481]: * Starting arango database server arangod
Apr 04 15:03:10 my-ip arangodb3[11481]: * database version check failed, maybe you need to run 'upgrade'?
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Control process exited, code=exited status=1
Apr 04 15:03:10 my-ip systemd[1]: Failed to start LSB: arangodb.
Apr 04 15:03:10 my-ip systemd[1]: arangodb3.service: Unit entered failed state.

From your log output it seems that the mounted disk volume goes away.
If the storage goes away under any kind of Database there is no reasonable way to continue working.
Thus the effects you see is that the ArangoDB isn't able to work with its data anymore - from its perspective its simply not there anymore.
One effect observed by others is that I/O credits on AWS dry up, which could also be the reason for what you see above.
https://aws.amazon.com/blogs/aws/new-burst-balance-metric-for-ec2s-general-purpose-ssd-gp2-volumes/
If I got that correctly, you can get more credits if you choose a bigger volume size. If that doesn't help, you either need to lower your test scenario, or choose a different hosting approach that doesn't have limitations on I/O operations.

Related

How to add MACs and KEX algorithms in /etc/ssh/sshd_config on Ubuntu 18.04 on GCP

I added following MACs to /etc/ssh/sshd_config of Ubuntu 18.04 compute instance on GCP. But after updating the file ssh is not restarting and journalctl -xe shows /etc/ssh/sshd_config line 130: Bad SSH2 mac spec.
MACs hmac-sha1-512-etm#openssh.com,hmac-sha1-512-etm#openssh.com,umac-128-etm#openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128#openssh.com
I see following error when I try to restart ssh:
$ sudo systemctl restart ssh
Job for ssh.service failed because the control process exited with error code.
See "systemctl status ssh.service" and "journalctl -xe" for details.
$ journalctl -xe
--
-- Unit ssh.service has begun starting up.
Aug 02 11:37:17 ubuntu1804 sshd[23779]: /etc/ssh/sshd_config line 130: Bad SSH2 mac spec 'hmac-sha1-512-etm#openssh.com,hmac-sha1-512-etm#openssh.com,umac-128-etm#open
Aug 02 11:37:17 ubuntu1804 systemd[1]: ssh.service: Control process exited, code=exited status=255
Aug 02 11:37:17 ubuntu1804 systemd[1]: ssh.service: Failed with result 'exit-code'.
Aug 02 11:37:17 ubuntu1804 systemd[1]: Failed to start OpenBSD Secure Shell server.
-- Subject: Unit ssh.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit ssh.service has failed.
--
-- The result is RESULT.
Aug 02 11:37:17 ubuntu1804 systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
Aug 02 11:37:17 ubuntu1804 systemd[1]: ssh.service: Scheduled restart job, restart counter is at 5.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Automatic restarting of the unit ssh.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Aug 02 11:37:17 ubuntu1804 systemd[1]: Stopped OpenBSD Secure Shell server.
-- Subject: Unit ssh.service has finished shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit ssh.service has finished shutting down.
Aug 02 11:37:17 ubuntu1804 systemd[1]: ssh.service: Start request repeated too quickly.
Aug 02 11:37:17 ubuntu1804 systemd[1]: ssh.service: Failed with result 'exit-code'.
Aug 02 11:37:17 ubuntu1804 systemd[1]: Failed to start OpenBSD Secure Shell server.
-- Subject: Unit ssh.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit ssh.service has failed.
--
-- The result is RESULT.
Following is the error received when I try to connect after logoff from the existing ssh session.
ubuntu1804> gcloud compute ssh ubuntu1804 --zone us-east1-b
ssh: connect to host 35.237.57.183 port 22: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
I did not find a single clue about this in google cloud documentation. I can fix the server but I would like to know what is the right way to add such configuration in sshd_config on a Ubuntu linux on GCP.
Verify acceptable values for MACs with ssh -Q mac. I'd assume hmac-sha1-512-etm#openssh.com and hmac-sha1-512-etm#openssh.com won't be there.

(Job for apache2.service failed because the control process exited with error code) occured after trying to activate webdav module

I tried to start my apache webserver but I can't. Every time I type in:
sservice apache2 start
I get the Error:
Job for apache2.service failed because the control process exited with error code.
I got the error the first time after I tried to activate the WebDAV module for apache2. But I already deactivated it. I rebooted the server too but no effect.
I'm running the apache on my second pc and access it via SSH.
Heres my Logfile:
--
-- A start job for unit phpsessionclean.service has begun execution.
--
-- The job identifier is 1448.
Jul 28 18:39:31 Server-MS-7B28 systemd[1]: phpsessionclean.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit phpsessionclean.service has successfully entered the 'dead' state.
Jul 28 18:39:31 Server-MS-7B28 systemd[1]: Finished Clean php session files.
-- Subject: A start job for unit phpsessionclean.service has finished successfully
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit phpsessionclean.service has finished successfully.
--
-- The job identifier is 1448.
Jul 28 18:40:25 Server-MS-7B28 sshd[2785]: Received disconnect from 222.186.31.166 port 58094:11: [preauth]
Jul 28 18:40:25 Server-MS-7B28 sshd[2785]: Disconnected from 222.186.31.166 port 58094 [preauth]
Jul 28 18:40:37 Server-MS-7B28 sshd[2787]: Received disconnect from 112.85.42.104 port 12119:11: [preauth]
Jul 28 18:40:37 Server-MS-7B28 sshd[2787]: Disconnected from 112.85.42.104 port 12119 [preauth]
Jul 28 18:41:43 Server-MS-7B28 sudo[2793]: pam_unix(sudo:auth): Couldn't open /etc/securetty: Datei oder Verzeichnis nicht gefunden
Jul 28 18:41:46 Server-MS-7B28 sudo[2793]: pam_unix(sudo:auth): Couldn't open /etc/securetty: Datei oder Verzeichnis nicht gefunden
Jul 28 18:41:46 Server-MS-7B28 sudo[2793]: elias-server : TTY=pts/0 ; PWD=/home/elias-server ; USER=root ; COMMAND=/bin/bash
Jul 28 18:41:46 Server-MS-7B28 sudo[2793]: pam_unix(sudo:session): session opened for user root by elias-server(uid=0)
Jul 28 18:41:53 Server-MS-7B28 audit[2808]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/lib/snapd/snap-confine" pid=2808 comm="apparmor_parser"
Jul 28 18:41:53 Server-MS-7B28 audit[2808]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=28>
Jul 28 18:41:53 Server-MS-7B28 kernel: audit: type=1400 audit(1595954513.320:3245): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/lib/snapd/snap-confine" pi>
Jul 28 18:41:53 Server-MS-7B28 kernel: audit: type=1400 audit(1595954513.320:3246): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/lib/snapd/snap-confine//mo>
Jul 28 18:42:09 Server-MS-7B28 systemd[1]: Starting The Apache HTTP Server...
-- Subject: A start job for unit apache2.service has begun execution
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit apache2.service has begun execution.
--
-- The job identifier is 1515.
Jul 28 18:42:09 Server-MS-7B28 apachectl[2837]: AH00526: Syntax error on line 32 of /etc/apache2/sites-enabled/000-default.conf:
Jul 28 18:42:09 Server-MS-7B28 apachectl[2837]: Invalid command 'DAV', perhaps misspelled or defined by a module not included in the server configuration
Jul 28 18:42:09 Server-MS-7B28 apachectl[2817]: Action 'start' failed.
Jul 28 18:42:09 Server-MS-7B28 apachectl[2817]: The Apache error log may have more information.
Jul 28 18:42:09 Server-MS-7B28 systemd[1]: apache2.service: Control process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- An ExecStart= process belonging to unit apache2.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Jul 28 18:42:09 Server-MS-7B28 systemd[1]: apache2.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit apache2.service has entered the 'failed' state with result 'exit-code'.
Jul 28 18:42:09 Server-MS-7B28 systemd[1]: Failed to start The Apache HTTP Server.
-- Subject: A start job for unit apache2.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit apache2.service has finished with a failure.
--
-- The job identifier is 1515 and the job result is failed.
Thank you for your help,
Elias
The problem is because some configuration files are deleted, you have to reinstall it.
REINSTALL APACHE2:
To replace configuration files that have been deleted, without purging the package, you can do:
sudo apt-get -o DPkg::Options::="--force-confmiss" --reinstall install apache2
To fully remove the apache2 config files, you should:
sudo apt-get purge apache2
which will then let you reinstall it in the usual way with:
sudo apt-get install apache2
This can happen if port 80 is already under use.
refer the link for more
You can use this to check if something is using the port.
netstat -plant | grep 80

Postgresql12.3 wont start on boot, systemd

I installed Postgres 12.3 from source code with steps(according to this):
./configure --with-openssl --with-systemd
make
sudo make install
If I start with pg_ctl from postgres user all works fine:
pg_ctl -D $PGDATA -l /path/to/logfile
Then I try to create a systemd service, as described here.
Steps:
Create file /etc/systemd/system/postgresql.service with content:
[Unit]
Description=PostgreSQL database server
Documentation=man:postgres(1)
[Service]
Type=notify
User=postgres
ExecStart=/usr/local/pgsql/bin/postgres -D /path/to/pgdata
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
KillSignal=SIGINT
TimeoutSec=0
[Install]
WantedBy=multi-user.target
sudo systemctl enable postgresql.service
Then I reboot my machine.
After restart Postgres unavaliable. Some logs:
sudo systemctl status postgresql.service
postgresql.service - PostgreSQL database server
Loaded: loaded (/etc/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-06-05 03:23:32 MSK; 37s ago
Docs: man:postgres(1)
Process: 724 ExecStart=/usr/local/pgsql/bin/postgres -D /path/to/pgdata (code=exited, status=1/FAILURE)
Main PID: 724 (code=exited, status=1/FAILURE)
Jun 05 03:23:31 ctsvc systemd[1]: Starting PostgreSQL database server...
Jun 05 03:23:32 ctsvc systemd[1]: postgresql.service: Main process exited, code=exited, status=1/FAILURE
Jun 05 03:23:32 ctsvc systemd[1]: Failed to start PostgreSQL database server.
Jun 05 03:23:32 ctsvc systemd[1]: postgresql.service: Unit entered failed state.
Jun 05 03:23:32 ctsvc systemd[1]: postgresql.service: Failed with result 'exit-code'.
journalctl -xe | grep postgres
-- Subject: Unit postgresql.service has begun start-up
-- Unit postgresql.service has begun starting up.
Jun 05 03:23:32 ctsvc postgres[724]: 2020-06-05 03:23:32.209 MSK [724] LOG: starting PostgreSQL 12.3 on armv7l-unknown-linux-gnueabihf, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 32-bit
Jun 05 03:23:32 ctsvc postgres[724]: 2020-06-05 03:23:32.211 MSK [724] LOG: could not bind IPv4 address "172.17.17.42": Cannot assign requested address
Jun 05 03:23:32 ctsvc postgres[724]: 2020-06-05 03:23:32.211 MSK [724] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
Jun 05 03:23:32 ctsvc postgres[724]: 2020-06-05 03:23:32.211 MSK [724] WARNING: could not create listen socket for "172.17.17.42"
Jun 05 03:23:32 ctsvc postgres[724]: 2020-06-05 03:23:32.211 MSK [724] FATAL: could not create any TCP/IP sockets
Jun 05 03:23:32 ctsvc postgres[724]: 2020-06-05 03:23:32.212 MSK [724] LOG: database system is shut down
Jun 05 03:23:32 ctsvc systemd[1]: postgresql.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit postgresql.service has failed
-- Unit postgresql.service has failed.
Jun 05 03:23:32 ctsvc systemd[1]: postgresql.service: Unit entered failed state.
Jun 05 03:23:32 ctsvc systemd[1]: postgresql.service: Failed with result 'exit-code'.
Jun 05 03:24:09 ctsvc sudo[1602]: user1 : TTY=pts/0 ; PWD=/home/user1 ; USER=root ; COMMAND=/bin/systemctl status postgresql.service
netstat -tnl | grep "5432" - shows nothing.
After that I can manualy run this service:
sudo systemctl status postgresql.service
● postgresql.service - PostgreSQL database server
Loaded: loaded (/etc/systemd/system/postgresql.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2020-06-05 03:30:57 MSK; 8s ago
Docs: man:postgres(1)
Main PID: 1681 (postgres)
Tasks: 8 (limit: 4915)
CGroup: /system.slice/postgresql.service
├─1681 /usr/local/pgsql/bin/postgres -D /path/to/pgdata
├─1683 postgres: checkpointer
├─1684 postgres: background writer
├─1685 postgres: walwriter
├─1686 postgres: autovacuum launcher
├─1687 postgres: stats collector
├─1688 postgres: logical replication launcher
└─1693 postgres: postgres postgres 172.17.17.40(53600) idle
Jun 05 03:30:56 ctsvc systemd[1]: Starting PostgreSQL database server...
Jun 05 03:30:57 ctsvc postgres[1681]: 2020-06-05 03:30:57.006 MSK [1681] LOG: starting PostgreSQL 12.3 on armv7l-unknown-linux-gnueabihf, compiled b
Jun 05 03:30:57 ctsvc postgres[1681]: 2020-06-05 03:30:57.007 MSK [1681] LOG: listening on IPv4 address "172.17.17.42", port 5432
Jun 05 03:30:57 ctsvc postgres[1681]: 2020-06-05 03:30:57.032 MSK [1681] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
Jun 05 03:30:57 ctsvc postgres[1681]: 2020-06-05 03:30:57.424 MSK [1682] LOG: database system was shut down at 2020-06-05 02:59:03 MSK
Jun 05 03:30:57 ctsvc postgres[1681]: 2020-06-05 03:30:57.725 MSK [1681] LOG: database system is ready to accept connections
Jun 05 03:30:57 ctsvc systemd[1]: Started PostgreSQL database server.
netstat -tnl | grep '5432'
tcp 0 0 172.17.17.42:5432 0.0.0.0:* LISTEN
In my postgresql.conf I have following:
# - Connection Settings -
listen_addresses = '172.17.17.42'
port = 5432
max_connections = 100
If it helps: Postgres runs on Cubietruck with Armbian.
uname -a
Linux ctsvc 4.19.62-sunxi #5.92 SMP Wed Jul 31 22:07:23 CEST 2019 armv7l GNU/Linux
In my system there are no more processes that try to bind this port at boot time. As far as I understand, with the service itself and Postgresql everything is fine. However, something strange happens during the launch, but I can’t understand how to find out the reason of this behavior.
Thanks in advance.
Finally my file /etc/systemd/system/postgresql.service looks like this:
[Unit]
Description=PostgreSQL database server
Documentation=man:postgres(1)
Wants=network-online.target
After=network.target network-online.target
[Service]
Type=notify
User=postgres
ExecStart=/usr/local/pgsql/bin/postgres -D /path/to/pgdata
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
KillSignal=SIGINT
TimeoutSec=0
[Install]
WantedBy=multi-user.target
Thanks to Laurenz Albe comment, I added following in Unit section:
Wants=network-online.target
After=network.target network-online.target
to make sure that network fully operational before PG start. After this PG running correctly after reboot.

Docker start failed in centos 7

Docker service running on Centos 7 failed to start, I have some docker images which I want to save at any cost. I have searched a couple of online docs and they all say to delete /var/lib/docker/ dir which I don't want to because all the images and containers stuff is there. Can someone please save me how to get docker back up and running with losing any data.
Log:
[root#BuyPandGDev01 /]# systemctl status docker.service -l
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2018-04-22 00:05:23 UTC; 19min ago
Docs: http://docs.docker.com
Process: 1539 ExecStart=/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $ADD_REGISTRY $BLOCK_REGISTRY $INSECURE_REGISTRY $REGISTRIES (code=exited, status=1/FAILURE)
Main PID: 1539 (code=exited, status=1/FAILURE)
Apr 22 00:05:22 BuyPandGDev01 systemd[1]: Starting Docker Application Container Engine...
Apr 22 00:05:22 BuyPandGDev01 dockerd-current[1539]: time="2018-04-22T00:05:22.068920976Z" level=info msg="libcontainerd: new containerd process, pid: 1550"
Apr 22 00:05:23 BuyPandGDev01 dockerd-current[1539]: time="2018-04-22T00:05:23.101036303Z" level=warning msg="devmapper: Usage of loopback devices is strongly discouraged for production use. Please use `--storage-opt dm.thinpooldev` or use `man docker` to refer to dm.thinpooldev section."
Apr 22 00:05:23 BuyPandGDev01 dockerd-current[1539]: time="2018-04-22T00:05:23.155223108Z" level=error msg="[graphdriver] prior storage driver \"devicemapper\" failed: devmapper: Base Device UUID and Filesystem verification failed: devicemapper: Error running deviceCreate (ActivateDevice) dm_task_run failed"
Apr 22 00:05:23 BuyPandGDev01 dockerd-current[1539]: time="2018-04-22T00:05:23.155708413Z" level=fatal msg="Error starting daemon: error initializing graphdriver: devmapper: Base Device UUID and Filesystem verification failed: devicemapper: Error running deviceCreate (ActivateDevice) dm_task_run failed"
Apr 22 00:05:23 BuyPandGDev01 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Apr 22 00:05:23 BuyPandGDev01 systemd[1]: Failed to start Docker Application Container Engine.
Apr 22 00:05:23 BuyPandGDev01 systemd[1]: Unit docker.service entered failed state.
Apr 22 00:05:23 BuyPandGDev01 systemd[1]: docker.service failed.
journalctl -xe:
[root#BuyPandGDev01 /]# journalctl -xe
-- Unit docker-storage-setup.service has begun starting up.
Apr 22 00:25:58 BuyPandGDev01 container-storage-setup[2111]: INFO: Volume group backing root filesystem could not be determined
Apr 22 00:25:58 BuyPandGDev01 container-storage-setup[2111]: ERROR: No valid volume group found. Exiting.
Apr 22 00:25:58 BuyPandGDev01 systemd[1]: docker-storage-setup.service: main process exited, code=exited, status=1/FAILURE
Apr 22 00:25:58 BuyPandGDev01 systemd[1]: Failed to start Docker Storage Setup.
-- Subject: Unit docker-storage-setup.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker-storage-setup.service has failed.
--
-- The result is failed.
Apr 22 00:25:58 BuyPandGDev01 systemd[1]: Unit docker-storage-setup.service entered failed state.
Apr 22 00:25:58 BuyPandGDev01 systemd[1]: docker-storage-setup.service failed.
Apr 22 00:25:58 BuyPandGDev01 systemd[1]: Starting Docker Application Container Engine...
-- Subject: Unit docker.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker.service has begun starting up.
Apr 22 00:25:58 BuyPandGDev01 dockerd-current[2140]: time="2018-04-22T00:25:58.731142431Z" level=info msg="libcontainerd: new containe
Apr 22 00:25:59 BuyPandGDev01 dockerd-current[2140]: time="2018-04-22T00:25:59.767061431Z" level=warning msg="devmapper: Usage of loop
Apr 22 00:25:59 BuyPandGDev01 kernel: device-mapper: table: 253:1: thin: Couldn't open thin internal device
Apr 22 00:25:59 BuyPandGDev01 kernel: device-mapper: ioctl: error adding target to table
Apr 22 00:25:59 BuyPandGDev01 dockerd-current[2140]: time="2018-04-22T00:25:59.835261589Z" level=error msg="[graphdriver] prior storag
Apr 22 00:25:59 BuyPandGDev01 dockerd-current[2140]: time="2018-04-22T00:25:59.835697590Z" level=fatal msg="Error starting daemon: err
Apr 22 00:25:59 BuyPandGDev01 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Apr 22 00:25:59 BuyPandGDev01 systemd[1]: Failed to start Docker Application Container Engine.
-- Subject: Unit docker.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit docker.service has failed.
--
-- The result is failed.
Apr 22 00:25:59 BuyPandGDev01 systemd[1]: Unit docker.service entered failed state.
Apr 22 00:25:59 BuyPandGDev01 systemd[1]: docker.service failed.
Apr 22 00:25:59 BuyPandGDev01 polkitd[703]: Unregistered Authentication Agent for unix-process:2105:147803 (system bus name :1.43, obj
lines 2751-2788/2788 (END)
Any response would be helpful and appreciated.
Thx,
kumar
This error occurred for me when I was upgrading docker. Solution that worked for me was to remove legacy docker files /var/lib/docker/ and restart the docker service. Here is the solution.
# Remove docker files
$ rm -rf /var/lib/docker/
# Restart docker via service or via systemctl
$ service docker restart
$ service docker status
$ systemctl start docker.service
$ systemctl status docker.service
I had this error also starting the docker service:
kernel: device-mapper: table: 253:1: thin: Couldn't open thin internal device
I fixed it by creating a soft link from /var/lib/docker to another location on the machine which had more disk space.
cd /var/lib/
mv docker docker.old
ln -s /path/to/big/disk/docker/ docker
Restart the service:
systemctl restart docker

Can not start keystone service

I installed packstack on my fresh installation of Fedora 21 with all updates. When I run
packstack --allinone I received this error:
ERROR : Error appeared during Puppet run: 192.168. 1.*_keystone.pp Error:
Could not start Service[keystone]: Execution of '/sbin/service openstack-keystone
start'` returned 1: Redirecting to /bin/systemctl start openstack-keystone.service
You will find full trace in log /var/tmp/packstack/20141223-022613-whLvTs/manifests
/192.168.1.*_keystone.pp.log
And this is the log:
Notice: /Stage[main]/Cinder::Keystone::Auth/Keystone_user_role[cinder#services]:
Dependency Service[keystone] has failures: true
Warning: /Stage[main]/Cinder::Keystone::Auth/Keystone_user_role[cinder#services]:
Skipping because of failed dependencies
Notice: Finished catalog run in 13.02 seconds
With systemctl status openstack-keystone.service get this:
openstack-keystone.service - OpenStack Identity Service (code-named Keystone)
Loaded: loaded (/usr/lib/systemd/system/openstack-keystone.service; disabled)
Active: failed (Result: start-limit) since Tue 2014-12-23 19:47:36 EET; 1min 59s ago
Process: 22526 ExecStart=/usr/bin/keystone-all (code=exited, status=1/FAILURE)
Main PID: 22526 (code=exited, status=1/FAILURE)
Dec 23 19:47:35 localhost.localdomain systemd[1]: Failed to start OpenStack...
Dec 23 19:47:35 localhost.localdomain systemd[1]: Unit openstack-keystone.s...
Dec 23 19:47:35 localhost.localdomain systemd[1]: openstack-keystone.servic...
Dec 23 19:47:36 localhost.localdomain systemd[1]: start request repeated to...
Dec 23 19:47:36 localhost.localdomain systemd[1]: Failed to start OpenStack...
Dec 23 19:47:36 localhost.localdomain systemd[1]: Unit openstack-keystone.s...
Dec 23 19:47:36 localhost.localdomain systemd[1]: openstack-keystone.servic...
This can happen due SELinux avc denial because of a missing policy.
You can try to put SELinux to permissive mode:
# setenforce 0
A similar bug

Resources