restore from pg_basebackup - linux

I made daily backups of a postgresql DB using the command
/usr/bin/pg_basebackup -D $outdir -Ft -x -z -w -R -v
Now I want to restore this DB on another server. I used the description on https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-PITR-RECOVERY.
The recovery.conf file included in the backup has the following contents:
standby_mode = 'on'
primary_conninfo = 'user=postgres port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
The next step (8.) in the documentation says to start postgresql. This results in a failure due to a timeout:
3783 postgres: startup process waiting for 0000000100000024000000B
On the original server I don't have this file. Is it possible to restore only the state of the pg_basebackup without using any WAL files? What should then be in the recovery.conf file?
Following the suggestion by #JosMac I moved the recovery.conf with this result:
shaun2:/var/lib/pgsql/data # service postgresql start
Job for postgresql.service failed because the control process exited with error code. See "systemctl status postgresql.service" and "journalctl -xe" for details.
shaun2:/var/lib/pgsql/data # service postgresql status
â postgresql.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2018-06-18 12:02:53 CEST; 12s ago
Process: 1340 ExecStop=/usr/lib/postgresql-init stop (code=exited, status=0/SUCCESS)
Process: 9355 ExecStart=/usr/lib/postgresql-init start (code=exited, status=1/FAILURE)
Main PID: 1060 (code=exited, status=0/SUCCESS)
Jun 18 12:02:52 shaun2 postgres[9369]: [3-1] 2018-06-18 12:02:52 CEST LOG: invalid checkpoint record
Jun 18 12:02:52 shaun2 postgres[9369]: [4-1] 2018-06-18 12:02:52 CEST FATAL: could not locate required checkpoint record
Jun 18 12:02:52 shaun2 postgres[9369]: [4-2] 2018-06-18 12:02:52 CEST HINT: If you are not restoring from a backup, try removing the file "/var/lib/pgsql/data/backup_label".
Jun 18 12:02:52 shaun2 postgres[9367]: [2-1] 2018-06-18 12:02:52 CEST LOG: startup process (PID 9369) exited with exit code 1
Jun 18 12:02:52 shaun2 postgres[9367]: [3-1] 2018-06-18 12:02:52 CEST LOG: aborting startup due to startup process failure
Jun 18 12:02:53 shaun2 postgresql-init[9355]: pg_ctl: could not start server
Jun 18 12:02:53 shaun2 systemd[1]: postgresql.service: Control process exited, code=exited status=1
Jun 18 12:02:53 shaun2 systemd[1]: Failed to start PostgreSQL database server.
Jun 18 12:02:53 shaun2 systemd[1]: postgresql.service: Unit entered failed state.
Jun 18 12:02:53 shaun2 systemd[1]: postgresql.service: Failed with result 'exit-code'.
I suppose that PostgreSQL is still looking for the missing WAL file because of the contents of backup_label:
shaun2:/var/lib/pgsql/data # cat backup_label
START WAL LOCATION: 24/B0000028 (file 0000000100000024000000B0)
CHECKPOINT LOCATION: 24/B0000028
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2018-06-14 02:55:08 CEST
LABEL: pg_basebackup base backup
Result after moving backup_label away:
shaun2:/var/lib/pgsql/data # service postgresql status
â postgresql.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2018-06-18 12:17:54 CEST; 4s ago
Process: 1340 ExecStop=/usr/lib/postgresql-init stop (code=exited, status=0/SUCCESS)
Process: 10401 ExecStart=/usr/lib/postgresql-init start (code=exited, status=1/FAILURE)
Main PID: 1060 (code=exited, status=0/SUCCESS)
Jun 18 12:17:53 shaun2 postgres[10414]: [4-1] 2018-06-18 12:17:53 CEST LOG: invalid secondary checkpoint record
Jun 18 12:17:53 shaun2 postgres[10414]: [5-1] 2018-06-18 12:17:53 CEST PANIC: could not locate a valid checkpoint record
Jun 18 12:17:54 shaun2 postgres[10412]: [2-1] 2018-06-18 12:17:54 CEST LOG: startup process (PID 10414) was terminated by signal 6: Aborted

We use pg_basebackup for backups and also did several restorations so generally it works very well without problems.
But I would recommend you to use parameter -X stream instead of -x (meaning "-X fetch"). With this parameter pg_basebackup will catch and store WAL log segments created during the time of backup together with data files. These WAL logs will be stored in separate pg_xlog.tar or pg_wal.tar files (depending on PG version).
Full description of restoration can be find here - pg_basebackup / pg-barman – restore tar backup

The -R option generates a recovery.conf file that is useful if the backup will be used in replica servers, because it sets the server in standby_mode and it also has the primary_conninfo to pull data from the primary.
So, if you just want to make/restore backups, I wouldn't use -R. Just in case it helps, I used these options: -v -P -x -F tar -z.
To restore the backup, unzip it to the proper directory (e.g. /var/lib/postgresql/$VERSION/main), create an empty recovery.conf file there (or clear the one you have, but better don't use -R), and start the server.

Related

Failed global initialization: FileRenameFailed: Could not rename preexisting log file

I get this error while trying to launch mongo deamon.
CONTROL [main] Failed global initialization: FileRenameFailed: Could
not rename preexisting log file
"/var/lib/mongodb/log/mongod.log" to
"/var/lib/mongodb/log/mongod.log.2021-12-02T14-32-24"; run
with --logappend or manually remove file: Permission denied
config
storage:
dbPath: "/var/lib/mondodb/data"
systemLog:
destination: file
path: "/var/lib/mongodb/log/mongod.log"
mongodb has ownership of /var/lib/mongodb and subdirs. Permissions are supposed to be fine.
mondodb dir
drwxr-xr-x 2 mongodb mongodb 4096 Dec 2 15:42 config
drwxr-xr-x 2 mongodb mongodb 4096 Dec 2 15:41 data
drwxr-xr-x 2 mongodb mongodb 4096 Dec 2 15:42 log
The service itself won't run either
> sudo service mongod status
● mongod.service - MongoDB Database Server
Loaded: loaded (/lib/systemd/system/mongod.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2021-12-06 17:09:38 GMT; 1s ago
Docs: https://docs.mongodb.org/manual
Process: 24234 ExecStart=/usr/bin/mongod --config /etc/mongod.conf (code=exited, status=100)
Main PID: 24234 (code=exited, status=100)
Dec 06 17:09:37 GEL-R90VQK84 systemd[1]: Started MongoDB Database Server.
Dec 06 17:09:38 GEL-R90VQK84 systemd[1]: mongod.service: Main process exited, code=exited, status=100/n/a
Dec 06 17:09:38 GEL-R90VQK84 systemd[1]: mongod.service: Failed with result 'exit-code'.
You are running the deamon as root?
Check the ownership of the file
/var/lib/mongodb/log/mongod.log

Installing Apache/2.4.37 on centos getting Error

I have been trying to install Apache/2.4.37 webserver on centos 8 machine and keep getting the following error
Job for httpd.service failed because the control process exited with error code.
See "systemctl status httpd.service" and "journalctl -xe" for details.
[root#localhost draj]# systemctl status httpd.service
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/httpd.service.d
└─php-fpm.conf
Active: failed (Result: exit-code) since Mon 2020-04-13 17:07:09 +0545; 4min 10s ago
Docs: man:httpd.service(8)
Process: 14004 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE)
Main PID: 14004 (code=exited, status=1/FAILURE)
Status: "Reading configuration..."
Apr 13 17:07:09 localhost.localdomain systemd[1]: Starting The Apache HTTP Server...
Apr 13 17:07:09 localhost.localdomain httpd[14004]: AH00526: Syntax error on line 122 of /etc/httpd/conf/httpd.conf:
Apr 13 17:07:09 localhost.localdomain httpd[14004]: DocumentRoot '/var/www/html' is not a directory, or is not readable
Apr 13 17:07:09 localhost.localdomain systemd[1]: httpd.service: Main process exited, code=exited, status=1/FAILURE
Apr 13 17:07:09 localhost.localdomain systemd[1]: httpd.service: Failed with result 'exit-code'.
Apr 13 17:07:09 localhost.localdomain systemd[1]: Failed to start The Apache HTTP Server.
Could somebody kindly shed some light on how to fix this problem, thanks !!

systemctl start httpd command not failing with error code

I am having a problem to run this code.
#systemctl start httpd
Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details.
# systemctl status httpd.service
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor prese t: disabled)
Active: failed (Result: exit-code) since Wed 2018-01-17 17:59:46 UTC; 20s ago
Docs: man:httpd(8)
man:apachectl(8)
Process: 2188 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=1/FAIL URE)
Process: 2187 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, st atus=1/FAILURE)
Main PID: 2187 (code=exited, status=1/FAILURE)
Jan 17 17:59:45 hackdays httpd[2187]: (98)Address already in use: AH00072...0
Jan 17 17:59:45 hackdays httpd[2187]: (98)Address already in use: AH00072...0
Jan 17 17:59:45 hackdays httpd[2187]: no listening sockets available, shu...n
Jan 17 17:59:45 hackdays httpd[2187]: AH00015: Unable to open logs
Jan 17 17:59:46 hackdays systemd[1]: httpd.service: main process exited, ...E
Jan 17 17:59:46 hackdays kill[2188]: kill: cannot find process ""
Jan 17 17:59:46 hackdays systemd[1]: httpd.service: control process exite...1
Jan 17 17:59:46 hackdays systemd[1]: Failed to start The Apache HTTP Server.
Jan 17 17:59:46 hackdays systemd[1]: Unit httpd.service entered failed state.
Jan 17 17:59:46 hackdays systemd[1]: httpd.service failed.
From the stacktrace I can see the below
Address already in use:
Can you check if the port is been used already by some other process.

Apache/HTTPD service not working

I am trying to start my Apache/HTTPD on my CentOS 7.3-1611 dedicated server.
When I start the service, I'am receiving the following error code:
[root#ns3033129 ~]# service httpd start
Redirecting to /bin/systemctl start httpd.service
Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details.
[root#ns3033129 ~]# systemctl status httpd.service
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2017-06-27 23:14:39 CEST; 9s ago
Process: 18137 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=1/FAILURE)
Process: 18134 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE)
Main PID: 18134 (code=exited, status=1/FAILURE)
Jun 27 23:14:39 ns3033129.ip-149-202-89.eu systemd[1]: Starting The Apache HTTP Server...
Jun 27 23:14:39 ns3033129.ip-149-202-89.eu httpd[18134]: [Tue Jun 27 23:14:39.351580 2017] [so:warn] [pid 18134] AH01574: module ruid2_module is already loaded, skipping
Jun 27 23:14:39 ns3033129.ip-149-202-89.eu httpd[18134]: httpd: Syntax error on line 58 of /etc/httpd/conf/httpd.conf: Syntax error on line 2 of /etc/httpd/conf.d/vesta.conf: Could not open configuration file /home/admi...le or directory
Jun 27 23:14:39 ns3033129.ip-149-202-89.eu systemd[1]: httpd.service: main process exited, code=exited, status=1/FAILURE
Jun 27 23:14:39 ns3033129.ip-149-202-89.eu kill[18137]: kill: cannot find process ""
Jun 27 23:14:39 ns3033129.ip-149-202-89.eu systemd[1]: httpd.service: control process exited, code=exited status=1
Jun 27 23:14:39 ns3033129.ip-149-202-89.eu systemd[1]: Failed to start The Apache HTTP Server.
Jun 27 23:14:39 ns3033129.ip-149-202-89.eu systemd[1]: Unit httpd.service entered failed state.
Jun 27 23:14:39 ns3033129.ip-149-202-89.eu systemd[1]: httpd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
The output of systemctl status httpd clearly said that it is configuration error in your vesta.conf.
Jun 27 23:14:39 ns3033129.ip-149-202-89.eu httpd[18134]: httpd: Syntax error on line 58 of /etc/httpd/conf/httpd.conf: Syntax error on line 2 of /etc/httpd/conf.d/vesta.conf: Could not open configuration file /home/admi...le or directory
Run "httpd -t" to check your httpd configuration before starting the httpd server.

calico-node rkt returns stage1-fly.aci.asc: no such file or directory

I have a CoreOS beta (1185.2.0) installed.
I have the following systemd service file to start calico-node:
[Unit]
Description=Calico per-host agent
Requires=network-online.target
After=network-online.target
[Service]
Slice=machine.slice
PermissionsStartOnly=true
Environment=ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
Environment=ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem
Environment=ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem
Environment=CALICO_DISABLE_FILE_LOGGING=true
Environment=HOSTNAME=10.79.218.2
Environment=IP=10.79.218.2
Environment=FELIX_FELIXHOSTNAME=10.79.218.2
Environment=CALICO_NETWORKING=true
Environment=NO_DEFAULT_POOLS=true
Environment=ETCD_ENDPOINTS=https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379
ExecStartPre=/bin/mkdir /var/run/calico
ExecStart=/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci --volume=var-run-calico,kind=host,source=/var/run/calico --volume=modules,kind=host,source=/lib/modules,readOnly=false --mount=volume=modules,target=/lib/modules --volume=dns,kind=host,source=/etc/resolv.conf,readOnly=true --volume=etcd-tls-certs,kind=host,source=/etc/ssl/etcd,readOnly=true --mount=volume=dns,target=/etc/resolv.conf --mount=volume=etcd-tls-certs,target=/etc/ssl/etcd --mount=volume=var-run-calico,target=/var/run/calico --trust-keys-from-https quay.io/calico/node:v0.22.0
KillMode=mixed
Restart=always
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
welp.. the systemd fails with:
● calico-node.service - Calico per-host agent
Loaded: loaded (/etc/systemd/system/calico-node.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit-hit) since Tue 2016-10-25 04:51:15 UTC; 9min ago
Process: 1970 ExecStart=/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci --volume=var-run-calico,kind=host,source=/var/
Process: 4307 ExecStartPre=/bin/mkdir /var/run/calico (code=exited, status=1/FAILURE)
Main PID: 1970 (code=exited, status=1/FAILURE)
Oct 25 04:51:15 coreos-2.tux-in.com systemd[1]: Failed to start Calico per-host agent.
Oct 25 04:51:15 coreos-2.tux-in.com systemd[1]: calico-node.service: Unit entered failed state.
Oct 25 04:51:15 coreos-2.tux-in.com systemd[1]: calico-node.service: Failed with result 'exit-code'.
Oct 25 04:51:15 coreos-2.tux-in.com systemd[1]: calico-node.service: Service hold-off time over, scheduling restart.
Oct 25 04:51:15 coreos-2.tux-in.com systemd[1]: Stopped Calico per-host agent.
Oct 25 04:51:15 coreos-2.tux-in.com systemd[1]: calico-node.service: Start request repeated too quickly.
Oct 25 04:51:15 coreos-2.tux-in.com systemd[1]: Failed to start Calico per-host agent.
Oct 25 04:51:15 coreos-2.tux-in.com systemd[1]: calico-node.service: Unit entered failed state.
Oct 25 04:51:15 coreos-2.tux-in.com systemd[1]: calico-node.service: Failed with result 'start-limit-hit'.
I tried setting the environment variables on terminal and running the rkt command and I got the error message
image: using image from file /usr/lib/rkt/stage1-images/stage1-fly.aci
run: open /usr/lib/rkt/stage1-images/stage1-fly.aci.asc: no such file or directory
I think that error may relate to the following configuration file at /etc/rkt/paths.d/paths.json
{
"rktKind": "paths",
"rktVersion": "v1",
"stage1-images": "/usr/lib/rkt/stage1-images"
}
I need the paths configuration file later on for kubernetes.
any ideas? the asc file really doesn't exist there.
/usr/lib is a dynamic link to /usr/lib64. rkt configured there not to search for certificates for container images at /usr/lib64 and not /usr/lib.
it seems that by default this configuration is already set properly, so just removing the file /etc/rkt/paths.d/paths.json resolves the issue.
full answer at https://github.com/coreos/rkt/issues/3320

Resources