X-Git-Url: https://git.adam-barratt.org.uk/?a=blobdiff_plain;f=input%2Fhowto%2Fpostgres-backup.creole;h=4f497caed40c5ec40eca4b75ac1ca3c80f6e680f;hb=3507c51343b0adb6fd1d352892b40d8654819e4a;hp=23be181f45dd243c0c9f5c07887ebf1eeb4b1df6;hpb=f4ea7e28a138227dfa183ab828c3ae5a2e0a9001;p=mirror%2Fdsa-wiki.git diff --git a/input/howto/postgres-backup.creole b/input/howto/postgres-backup.creole index 23be181..4f497ca 100644 --- a/input/howto/postgres-backup.creole +++ b/input/howto/postgres-backup.creole @@ -4,49 +4,156 @@ Backing up postgres consists of two main pieces: backing up the WALs (write ahead logs) and regular backups of the base. See https://www.postgresql.org/docs/9.4/static/continuous-archiving.html -== WAL Shipping == +The WALs are copied from the postgres server to the backup hosts using ssh with +the {{{pg-backup-file}}} script. Base backups are pulled on the backup hosts +using {{{postgres-make-base-backups}}}. The former requires the postgres servers +be able to ssh to the backup servers, and the latter requires the postgres server +listen on the network, have ssl set up correctly, access is allowed in the firewall, +a postgres user with replication privileges exists and is configured to allowed +to connect in {{{pg_hba}}}. -* On the db server, create an ssh key for postgres: -{{{ - sudo -H -u postgres ssh-keygen -C "postgres@`hostname` (`date +%Y%m%d`)" -}}} -* clone dsa-misc: -{{{ - cd /usr/local && git clone https://db.debian.org/git/dsa-misc.git && - cd bin && ln -s ../dsa-misc/scripts/pg-backup/pg* . -}}} -* Add these to /etc/postgresql/9.1/main/postgresql.conf +== server config == + +(2018-02 - This documentation has not really been tested since it was written -- weasel) + +There are two possible ways to configure the server. + +=== Single cluster on a host === + +If there is only one cluster on the host, we can use puppet's +{{{postgresql::server}}} to configure the cluster and any databases on it, see +{{{modules/salsa/manifests/database.pp}}} for an example. In particular, you +want to set {{{archive_command}}} and the ssl options in {{{pg.conf}}}, as well +as set {{{listen_address}}} correctly. + +Add a {{{postgres::backup_cluster}}} stanza to get it backed up. + +=== Multiple clusters/compatibility mode === + +If there is potentially more than one cluster, we cannot use the puppet +{{{postgresql::server}}} class. We also use this for clusters that were +initially set up without puppet. + +* Add the server to the postgresql_server role in puppet's + hieradata/common.yaml. This will cause some scripts to be installed on the + host, as well as an ssh key to be created for the postgres user. + +* Add these to {{{/etc/postgresql/9.6/main/postgresql.conf}}} or equivalent {{{ track_counts = yes - archive_mode = yes + archive_mode = on wal_level = archive max_wal_senders = 3 archive_timeout = 1h archive_command = '/usr/local/bin/pg-backup-file main WAL %p' }}} -* On the backup server (storace as of 2015), add the ssh public key to /etc/ssh/userkeys/debbackup. -* Add the new server and cluster name to /etc/nagios/dsa-check-backuppg.conf. +* Run puppet on the postgresql server, + +==== ssh authkeys ==== +* If you need extra options in the {{{debbackup-ssh-wrap}}} call on the backup server + (for instance of the host should be allowed to fetch files), manually copy + {{{~postgres/.ssh/id_rsa.pub}}} to + {{{puppet:modules/postgres/templates/backup_server/sshkeys-manual.erb}}}. +* Otherwise, add the host to the postgres::backup_server::register_backup_clienthost line + in {{{puppet:modules/postgres/manifests/backup_source.pp}}}. -== base backups == +==== base backup config ==== -* On the db server, create a role. Give a nice, long password. +* Register each cluster in puppet's + {{{puppet:modules/postgres/manifests/backup_source.pp}}}. + This takes care of adding the replication user to pgpass on the backup servers, + and the firewall rule and adds the cluster to {{{make-base-backups}}}. + (The module can also create the postgres role and modify the hba file, but we + do not do this when we don't configure the entire cluster via puppet.) +* Historically, we also have clusters hardcoded in + {{{puppet:modules/postgres/templates/backup_server/postgres-make-base-backups.erb}}}. +* Run puppet on the backup hosts (storace and backuphost as of 2018). + +* On the db server, create a role. Find the password to use on the backup host in {{{~debbackup/.pgpass}}}:\\ + {{{sudo -u postgres createuser -D -E -P -R -S debian-backup}}} +* Give the role replication access:\\ + {{{sudo -u postgres psql -c 'ALTER ROLE "debian-backup" REPLICATION;'}}} +* Add an entry to pg_hba to allow access:\\ + {{{hostssl replication debian-backup 5.153.231.12/32 md5 # backuphost +hostssl replication debian-backup 2001:41c8:1000:21::21:12/128 md5 # backuphost +hostssl replication debian-backup 93.94.130.161/32 md5 # storace +hostssl replication debian-backup 2a02:158:380:280::161/128 md5 # storace}}} +* Ensure pg is listening on *. +* Ensure the server is using ssl and a proper debian auto-ca cert. +* Reload db server. +* Test running "postgres-make-base-backups host:port". +* You should see a tarball and WALs + += Nagios warnings = + +== BASE-IS-OLD == + +(2018-02) Our nagios check warns us when a backup server has not successfully fetched +a base backup recently. The causes often are that either the postgres server or the +backup host went down or was down during the time of the weekly cronjob. + +To re-run a base backup for a specific cluster, log into the backup server +(either storace or backuphost), cat /usr/local/bin/postgres-make-base-backups +to see the port for the cluster, and run {{{ - sudo -u postgres createuser -D -E -P -R -S debian-backup + sudo -u debbackup /usr/local/bin/postgres-make-base-backups : }}} -* Give the role replication access: +probably best to do that in a screen as it might take a while. + +== MISSING-BASE == + +e.g.: {{{ - sudo -u postgres psql -c 'ALTER ROLE "debian-backup" REPLICATION;' +sudo -u debbackup /usr/lib/nagios/plugins/dsa-check-backuppg | grep BASE +[fasolo, dak] MISSING-BASE: dak.BASE.backuphost.debian.org-20180211-012002-fasolo.debian.org-dak-9.6-backup.tar.gz }}} -* Add an entry to pg_hba to allow access: + +This means that we started doing a base backup (as witnessed by a .backup file +next to a WAL), but for some reason we don't have the corresponding base file. {{{ - hostssl replication debian-backup 93.94.130.161/32 md5 # storace +root@backuphost:/srv/backups/pg/fasolo# ls -l *backup* +-rw------- 1 debbackup debbackup 9201093916 Jan 14 06:18 dak.BASE.backuphost.debian.org-20180114-012001-fasolo.debian.org-dak-9.6-backup.tar.gz +-rw------- 1 debbackup debbackup 9227651542 Jan 21 06:25 dak.BASE.backuphost.debian.org-20180121-012001-fasolo.debian.org-dak-9.6-backup.tar.gz +-rw------- 1 debbackup debbackup 9266306750 Jan 28 07:59 dak.BASE.backuphost.debian.org-20180128-012001-fasolo.debian.org-dak-9.6-backup.tar.gz +-rw------- 1 debbackup debbackup 9312602089 Feb 5 11:00 dak.BASE.backuphost.debian.org-20180204-012001-fasolo.debian.org-dak-9.6-backup.tar.gz +-rw------- 1 debbackup debbackup 9346830509 Feb 12 10:25 dak.BASE.backuphost.debian.org-20180212-094930-fasolo.debian.org-dak-9.6-backup.tar.gz +-rw------- 1 debbackup debbackup 353 Jan 14 06:18 dak.WAL.0000000100000033000000A6.00000028.backup +-rw------- 1 debbackup debbackup 350 Jan 20 11:20 dak.WAL.00000001000000350000008C.00000028.backup +-rw------- 1 debbackup debbackup 353 Jan 21 06:25 dak.WAL.000000010000003600000068.00000028.backup +-rw------- 1 debbackup debbackup 353 Jan 28 07:59 dak.WAL.0000000100000038000000E3.00000028.backup +-rw------- 1 debbackup debbackup 353 Feb 5 11:00 dak.WAL.000000010000003B00000090.00000028.backup +-rw------- 1 debbackup debbackup 350 Feb 5 15:49 dak.WAL.000000010000003B0000009B.00000108.backup +-rw------- 1 debbackup debbackup 353 Feb 11 10:09 dak.WAL.000000010000003D000000AC.00000028.backup +-rw------- 1 debbackup debbackup 353 Feb 12 10:25 dak.WAL.000000010000003E00000027.00000178.backup }}} -* Ensure pg is listening on * and that connections from storace are allowed through the firewall. -* Ensure the server is using a proper debian auto-ca cert. -* Reload db server. -* Add host:port combination to postgres-make-base-backups in dsa-puppet. +{{{.backup}}} files are created on the postgres server and shipped to the +backup hosts whenever a base backup is initiated. We do some labelling, so +we know which backup host the corresponding tarball should end up with. -* Test running "postgres-make-base-backups host:port". -* You should see a tarball and WALs +e.g.: +{{{ +root@backuphost:/srv/backups/pg/fasolo# cat dak.WAL.000000010000003B00000090.00000028.backup +START WAL LOCATION: 3B/90000028 (file 000000010000003B00000090) +STOP WAL LOCATION: 3B/97CF2138 (file 000000010000003B00000097) +CHECKPOINT LOCATION: 3B/90000098 +BACKUP METHOD: streamed +BACKUP FROM: master +START TIME: 2018-02-05 10:25:28 UTC +LABEL: backuphost.debian.org-20180204-012001-fasolo.debian.org-dak-9.6-backup +STOP TIME: 2018-02-05 10:59:50 UTC +}}} + +To fix this, verify we have a later base tarball, or that we are fine for some other reason, +and remove the corresponding .backup file. In the case above, we would remove +{{{dak.WAL.000000010000003D000000AC.00000028.backup}}}. + +== WAL-MISSING-AFTER == + +e.g.: +{{{ +[bmdb1, main] WAL-MISSING-AFTER: bmdb1/main.WAL.0000000100001340000000DB +}} + +If it's just one WAL file missing, it can be recovered from the other backup host. If more logs are missing, check the server's logs for archive errors.