X-Git-Url: https://git.adam-barratt.org.uk/?a=blobdiff_plain;f=input%2Fhowto%2Fpostgres-backup.creole;h=5e21e0500d44ef9866c9ba560da1e4f786baf64b;hb=35d4c16b68e908fbf383a9011df41192ac692288;hp=9c07596a13ffe36e4b2d19aae08266b2eaedf2aa;hpb=d706e82902eb51bc2b34d5c4c861d26739eab188;p=mirror%2Fdsa-wiki.git diff --git a/input/howto/postgres-backup.creole b/input/howto/postgres-backup.creole index 9c07596..5e21e05 100644 --- a/input/howto/postgres-backup.creole +++ b/input/howto/postgres-backup.creole @@ -30,13 +30,12 @@ Add a {{{postgres::backup_cluster}}} stanza to get it backed up. === Multiple clusters/compatibility mode === -If there is potentially more than one cluster, we cannot use the puppet -{{{postgresql::server}}} class. We also use this for clusters that were -initially set up without puppet. +Since we often have more than one cluster, we cannot use the puppet +{{{postgresql::server}}} class for most things. -* Add the server to the postgresql_server role in puppet's - hieradata/common.yaml. This will cause some scripts to be installed on the - host, as well as an ssh key to be created for the postgres user. +* Add the server to the roles::postgresql::server class role in hiera. + This will cause some scripts to be installed on the host, as well as an ssh + key to be created for the postgres user. * Add these to {{{/etc/postgresql/9.6/main/postgresql.conf}}} or equivalent {{{ @@ -45,30 +44,26 @@ initially set up without puppet. wal_level = archive max_wal_senders = 3 archive_timeout = 1h - archive_command = '/usr/local/bin/pg-backup-file main WAL %p' + archive_command = '/usr/local/bin/pg-backup-file mXXXXXX-CLUSTERNAMEHERE-XXXXain WAL %p' }}} * Run puppet on the postgresql server, -==== ssh authkeys ==== -* If you need extra options in the {{{debbackup-ssh-wrap}}} call on the backup server - (for instance of the host should be allowed to fetch files), manually copy - {{{~postgres/.ssh/id_rsa.pub}}} to - {{{puppet:modules/postgres/templates/backup_server/sshkeys-manual.erb}}}. -* Otherwise, add the host to the postgres::backup_server::register_backup_clienthost line - in {{{puppet:modules/postgres/manifests/backup_source.pp}}}. +* If the server is a replication receiver, it needs read access to the sender's WALs + on the backup host (to recover from situations where the source might no longer + have the WALs.) This can be configured via hiera as well. Example: +{{{ +[git|master] weasel@orinoco:~/projects/debian/d-a/dsa-puppet$ cat data/nodes/snapshotdb-manda-01.debian.org.yaml +classes: + - roles::snapshot_db + - roles::postgresql::server + +postgres::backup_server::register_backup_clienthost::allow_read_hosts: ['sallinen'] +}}} ==== base backup config ==== -* Register each cluster in puppet's - {{{puppet:modules/postgres/manifests/backup_source.pp}}}. - This takes care of adding the replication user to pgpass on the backup servers, - and the firewall rule and adds the cluster to {{{make-base-backups}}}. - (The module can also create the postgres role and modify the hba file, but we - do not do this when we don't configure the entire cluster via puppet.) -* Historically, we also have clusters hardcoded in - {{{puppet:modules/postgres/templates/backup_server/postgres-make-base-backups.erb}}}. -* Run puppet on the backup hosts (storace and backuphost as of 2018). +* Run puppet on the backup hosts (storace and backuphost as of 2019). * On the db server, create a role. Find the password to use on the backup host in {{{~debbackup/.pgpass}}}:\\ {{{sudo -u postgres createuser -D -E -P -R -S debian-backup}}} @@ -100,3 +95,60 @@ to see the port for the cluster, and run sudo -u debbackup /usr/local/bin/postgres-make-base-backups : }}} probably best to do that in a screen as it might take a while. + +== MISSING-BASE == + +e.g.: +{{{ +sudo -u debbackup /usr/lib/nagios/plugins/dsa-check-backuppg | grep BASE +[fasolo, dak] MISSING-BASE: dak.BASE.backuphost.debian.org-20180211-012002-fasolo.debian.org-dak-9.6-backup.tar.gz +}}} + +This means that we started doing a base backup (as witnessed by a .backup file +next to a WAL), but for some reason we don't have the corresponding base file. +{{{ +root@backuphost:/srv/backups/pg/fasolo# ls -l *backup* +-rw------- 1 debbackup debbackup 9201093916 Jan 14 06:18 dak.BASE.backuphost.debian.org-20180114-012001-fasolo.debian.org-dak-9.6-backup.tar.gz +-rw------- 1 debbackup debbackup 9227651542 Jan 21 06:25 dak.BASE.backuphost.debian.org-20180121-012001-fasolo.debian.org-dak-9.6-backup.tar.gz +-rw------- 1 debbackup debbackup 9266306750 Jan 28 07:59 dak.BASE.backuphost.debian.org-20180128-012001-fasolo.debian.org-dak-9.6-backup.tar.gz +-rw------- 1 debbackup debbackup 9312602089 Feb 5 11:00 dak.BASE.backuphost.debian.org-20180204-012001-fasolo.debian.org-dak-9.6-backup.tar.gz +-rw------- 1 debbackup debbackup 9346830509 Feb 12 10:25 dak.BASE.backuphost.debian.org-20180212-094930-fasolo.debian.org-dak-9.6-backup.tar.gz +-rw------- 1 debbackup debbackup 353 Jan 14 06:18 dak.WAL.0000000100000033000000A6.00000028.backup +-rw------- 1 debbackup debbackup 350 Jan 20 11:20 dak.WAL.00000001000000350000008C.00000028.backup +-rw------- 1 debbackup debbackup 353 Jan 21 06:25 dak.WAL.000000010000003600000068.00000028.backup +-rw------- 1 debbackup debbackup 353 Jan 28 07:59 dak.WAL.0000000100000038000000E3.00000028.backup +-rw------- 1 debbackup debbackup 353 Feb 5 11:00 dak.WAL.000000010000003B00000090.00000028.backup +-rw------- 1 debbackup debbackup 350 Feb 5 15:49 dak.WAL.000000010000003B0000009B.00000108.backup +-rw------- 1 debbackup debbackup 353 Feb 11 10:09 dak.WAL.000000010000003D000000AC.00000028.backup +-rw------- 1 debbackup debbackup 353 Feb 12 10:25 dak.WAL.000000010000003E00000027.00000178.backup +}}} + +{{{.backup}}} files are created on the postgres server and shipped to the +backup hosts whenever a base backup is initiated. We do some labelling, so +we know which backup host the corresponding tarball should end up with. + +e.g.: +{{{ +root@backuphost:/srv/backups/pg/fasolo# cat dak.WAL.000000010000003B00000090.00000028.backup +START WAL LOCATION: 3B/90000028 (file 000000010000003B00000090) +STOP WAL LOCATION: 3B/97CF2138 (file 000000010000003B00000097) +CHECKPOINT LOCATION: 3B/90000098 +BACKUP METHOD: streamed +BACKUP FROM: master +START TIME: 2018-02-05 10:25:28 UTC +LABEL: backuphost.debian.org-20180204-012001-fasolo.debian.org-dak-9.6-backup +STOP TIME: 2018-02-05 10:59:50 UTC +}}} + +To fix this, verify we have a later base tarball, or that we are fine for some other reason, +and remove the corresponding .backup file. In the case above, we would remove +{{{dak.WAL.000000010000003D000000AC.00000028.backup}}}. + +== WAL-MISSING-AFTER == + +e.g.: +{{{ +[bmdb1, main] WAL-MISSING-AFTER: bmdb1/main.WAL.0000000100001340000000DB +}} + +If it's just one WAL file missing, it can be recovered from the other backup host. If more logs are missing, check the server's logs for archive errors.