mirror/dsa-nagios.git
5 years agoFix previous commit
Aurelien Jarno [Thu, 7 Mar 2019 21:09:30 +0000 (22:09 +0100)]
Fix previous commit

5 years agoAdd smit
Aurelien Jarno [Thu, 7 Mar 2019 20:54:51 +0000 (21:54 +0100)]
Add smit

5 years agoAdd schmelzer
Julien Cristau [Wed, 20 Feb 2019 16:01:51 +0000 (17:01 +0100)]
Add schmelzer

5 years agoAdapt timedatectl check for buster
Moritz Muehlenhoff [Wed, 13 Feb 2019 16:14:29 +0000 (17:14 +0100)]
Adapt timedatectl check for buster

In the systemd version in buster, the output format of timedatectl changed:
- "Network time on: yes" became "NTP service: active"
- "NTP synchronized: yes" became "System clock synchronized: yes"

Signed-off-by: Peter Palfrader <peter@palfrader.org>
5 years agoDecommission kantuser (RT#7583)
Julien Cristau [Sun, 17 Feb 2019 18:55:28 +0000 (19:55 +0100)]
Decommission kantuser (RT#7583)

5 years agoMove ppc64el-osuosl-01 to pijper
Aurelien Jarno [Sun, 3 Feb 2019 16:40:34 +0000 (17:40 +0100)]
Move ppc64el-osuosl-01 to pijper

5 years agoadd check for logs on loghost-osuosl-01
Julien Cristau [Mon, 28 Jan 2019 22:49:29 +0000 (23:49 +0100)]
add check for logs on loghost-osuosl-01

5 years agoadd loghost-osuosl-01
Julien Cristau [Mon, 28 Jan 2019 21:33:55 +0000 (22:33 +0100)]
add loghost-osuosl-01

5 years agodsa-check-soas: fix error when 0 (or more than 1) records returned
Adam D. Barratt [Mon, 28 Jan 2019 15:05:26 +0000 (15:05 +0000)]
dsa-check-soas: fix error when 0 (or more than 1) records returned

3956f21a moved some processing into resolve_ns. When an unexpected
number of records is returned, an attempt is made to update the
list of warnings, which is not within resolve_ns()'s local scope. The
result is an "undefined local variable" error.

Resolve this by explicitly marking the "warnings" list as in instance
scope. (Although it is not currently a problem, the "OKs" list is also
similarly marked, for future-proofing.)

Signed-off-by: Adam D. Barratt <adam@adam-barratt.org.uk>
5 years agoIgnore "Cache Battery 0 in controller 0 is Degraded" on wieck
Julien Cristau [Sun, 27 Jan 2019 10:04:09 +0000 (11:04 +0100)]
Ignore "Cache Battery 0 in controller 0 is Degraded" on wieck

5 years agoAdd checks for {www,wiki}.debconf.org
Julien Cristau [Thu, 17 Jan 2019 19:22:13 +0000 (20:22 +0100)]
Add checks for {www,wiki}.debconf.org

5 years agodsa-check-running-kernel: handle -unsigned packages
Peter Palfrader [Thu, 17 Jan 2019 11:55:02 +0000 (12:55 +0100)]
dsa-check-running-kernel: handle -unsigned packages

5 years agoIgnore cache battery warning on schumann
Julien Cristau [Thu, 10 Jan 2019 21:10:40 +0000 (22:10 +0100)]
Ignore cache battery warning on schumann

5 years agoRT#7513 Remove moszumanska
Tollef Fog Heen [Mon, 7 Jan 2019 20:53:17 +0000 (21:53 +0100)]
RT#7513 Remove moszumanska

5 years agodsa-check-hpssacli: ignore text after "active spare" from pd status
Julien Cristau [Sat, 15 Dec 2018 10:00:06 +0000 (11:00 +0100)]
dsa-check-hpssacli: ignore text after "active spare" from pd status

Rather than getting confused by

      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 146 GB, OK, active spare for 1I:1:1)

treat it the same as "active spare".

5 years agoCleanup dsa-update-unowned-file-status and dsa-update-unowned-file-status creation...
Peter Palfrader [Mon, 26 Nov 2018 13:25:08 +0000 (14:25 +0100)]
Cleanup dsa-update-unowned-file-status and dsa-update-unowned-file-status creation of statusdir

5 years agochangeloge entry
Peter Palfrader [Mon, 26 Nov 2018 13:23:29 +0000 (14:23 +0100)]
changeloge entry

5 years agoMerge remote-tracking branch 'waja/update-apt-statusdir'
Peter Palfrader [Mon, 26 Nov 2018 13:22:14 +0000 (14:22 +0100)]
Merge remote-tracking branch 'waja/update-apt-statusdir'

* waja/update-apt-statusdir:
  Create directory if not existing

5 years agoCreate directory if not existing
Jan Wagner [Mon, 26 Nov 2018 11:49:34 +0000 (12:49 +0100)]
Create directory if not existing

5 years agomanda-node0[34] have many processes
Julien Cristau [Thu, 22 Nov 2018 18:44:40 +0000 (19:44 +0100)]
manda-node0[34] have many processes

5 years agoAdd pijper
Julien Cristau [Mon, 19 Nov 2018 17:08:41 +0000 (18:08 +0100)]
Add pijper

5 years agoDraghi no longer has a /boot
Peter Palfrader [Sun, 18 Nov 2018 17:06:17 +0000 (18:06 +0100)]
Draghi no longer has a /boot

5 years agomanda-node0[34] run drbd
Peter Palfrader [Sun, 18 Nov 2018 12:42:23 +0000 (13:42 +0100)]
manda-node0[34] run drbd

5 years agoBlacklist openmanage battery probe on wieck and schumann
Julien Cristau [Tue, 13 Nov 2018 14:26:39 +0000 (15:26 +0100)]
Blacklist openmanage battery probe on wieck and schumann

5 years agoAdd hostgroup for new dell hosts
Julien Cristau [Wed, 7 Nov 2018 22:05:11 +0000 (23:05 +0100)]
Add hostgroup for new dell hosts

5 years agocheck if we can reach the peer on the backend network at new-manda
Peter Palfrader [Wed, 7 Nov 2018 17:16:02 +0000 (18:16 +0100)]
check if we can reach the peer on the backend network at new-manda

5 years agoadd manda-node03
Julien Cristau [Tue, 6 Nov 2018 22:26:36 +0000 (23:26 +0100)]
add manda-node03

5 years agoadd manda-node04
Julien Cristau [Tue, 6 Nov 2018 21:25:30 +0000 (22:25 +0100)]
add manda-node04

5 years agosibelius no longer runs postgresql
Julien Cristau [Thu, 1 Nov 2018 17:55:06 +0000 (18:55 +0100)]
sibelius no longer runs postgresql

5 years agodsa-check-zone-rrsig-expiration-many: fix use of uninitialized value in numeric gt (>)
Peter Palfrader [Thu, 25 Oct 2018 07:48:16 +0000 (09:48 +0200)]
dsa-check-zone-rrsig-expiration-many: fix use of uninitialized value in numeric gt (>)

We have a state count array, and we assign each state (ok, warn, etc.) a
nagios error code.  one of the states we use internally is "unsigned",
which is not an error but did not have an integer exit code.  Give it 0
now.

5 years agobendel ("heavy-postfix") also runs fail2ban
Peter Palfrader [Fri, 12 Oct 2018 09:13:11 +0000 (11:13 +0200)]
bendel ("heavy-postfix") also runs fail2ban

5 years agocheck if fail2ban is running where it should
Peter Palfrader [Wed, 10 Oct 2018 12:18:22 +0000 (14:18 +0200)]
check if fail2ban is running where it should

5 years agohandel will have an /srv soon
Peter Palfrader [Tue, 9 Oct 2018 17:38:57 +0000 (19:38 +0200)]
handel will have an /srv soon

5 years agoCheck if all unbound trust anchors are current
Peter Palfrader [Tue, 9 Oct 2018 07:45:58 +0000 (09:45 +0200)]
Check if all unbound trust anchors are current

5 years agoretire alioth hostgroup
Peter Palfrader [Tue, 9 Oct 2018 07:45:16 +0000 (09:45 +0200)]
retire alioth hostgroup

5 years agoAdd dsa-check-unbound-anchors
Peter Palfrader [Tue, 9 Oct 2018 07:42:17 +0000 (09:42 +0200)]
Add dsa-check-unbound-anchors

5 years agoconova-node*: ping our drbd/ganeti peer on the mgmt network
Peter Palfrader [Tue, 7 Aug 2018 07:14:21 +0000 (09:14 +0200)]
conova-node*: ping our drbd/ganeti peer on the mgmt network

5 years agomonitor drbd at conova
Peter Palfrader [Tue, 7 Aug 2018 06:49:13 +0000 (08:49 +0200)]
monitor drbd at conova

5 years agoDecommission powerpc-osuosl-01
Julien Cristau [Mon, 6 Aug 2018 16:29:30 +0000 (18:29 +0200)]
Decommission powerpc-osuosl-01

5 years agoRemove powerpc-unicamp-01
Julien Cristau [Mon, 6 Aug 2018 15:51:04 +0000 (17:51 +0200)]
Remove powerpc-unicamp-01

5 years agoretire hostgroup sparc
Peter Palfrader [Tue, 17 Jul 2018 12:53:53 +0000 (14:53 +0200)]
retire hostgroup sparc

5 years agoretire hostgroup wheezy
Peter Palfrader [Tue, 17 Jul 2018 12:51:22 +0000 (14:51 +0200)]
retire hostgroup wheezy

5 years agoretire hostgroup wheezy
Peter Palfrader [Tue, 17 Jul 2018 12:47:30 +0000 (14:47 +0200)]
retire hostgroup wheezy

5 years agoretire smetana
Peter Palfrader [Mon, 16 Jul 2018 12:18:18 +0000 (14:18 +0200)]
retire smetana

5 years agosw-raid on arm-arm-0[134]
Julien Cristau [Mon, 2 Jul 2018 18:15:20 +0000 (20:15 +0200)]
sw-raid on arm-arm-0[134]

5 years agounicamp renumbering
Julien Cristau [Fri, 29 Jun 2018 14:08:58 +0000 (16:08 +0200)]
unicamp renumbering

5 years agoremove parth, re: RT#7334
Peter Palfrader [Sun, 24 Jun 2018 21:21:58 +0000 (23:21 +0200)]
remove parth, re: RT#7334

5 years agodf -h checks on nfs client at lw
Peter Palfrader [Fri, 1 Jun 2018 16:51:29 +0000 (18:51 +0200)]
df -h checks on nfs client at lw

5 years agoremove most of the monitoring for moszumanska
Peter Palfrader [Thu, 31 May 2018 13:27:54 +0000 (15:27 +0200)]
remove most of the monitoring for moszumanska

5 years agonot this varnish process job on jessie
Peter Palfrader [Wed, 30 May 2018 12:21:29 +0000 (14:21 +0200)]
not this varnish process job on jessie

5 years agoboth pkgmirror-csail and sibelius run varnish
Peter Palfrader [Wed, 30 May 2018 09:18:14 +0000 (11:18 +0200)]
both pkgmirror-csail and sibelius run varnish

5 years agomonitor varnish, haproxy
Peter Palfrader [Wed, 30 May 2018 08:35:26 +0000 (10:35 +0200)]
monitor varnish, haproxy

5 years agomove lw0[78] to stretch
Peter Palfrader [Mon, 28 May 2018 22:03:56 +0000 (00:03 +0200)]
move lw0[78] to stretch

5 years agomove lw0[1234] to stretch
Peter Palfrader [Mon, 28 May 2018 20:26:11 +0000 (22:26 +0200)]
move lw0[1234] to stretch

5 years agokantuser has apache
Julien Cristau [Thu, 10 May 2018 14:02:43 +0000 (16:02 +0200)]
kantuser has apache

6 years agosallinen now runs apache
Julien Cristau [Mon, 30 Apr 2018 08:42:21 +0000 (10:42 +0200)]
sallinen now runs apache

6 years agoAdd kantuser
Julien Cristau [Tue, 24 Apr 2018 20:58:21 +0000 (23:58 +0300)]
Add kantuser

6 years agoadd grabbe
Peter Palfrader [Tue, 24 Apr 2018 20:50:10 +0000 (22:50 +0200)]
add grabbe

6 years agoRetire check for SSL certs living in puppet, they're all gone
Julien Cristau [Fri, 13 Apr 2018 11:30:51 +0000 (13:30 +0200)]
Retire check for SSL certs living in puppet, they're all gone

6 years agoDecommission zemlinsky.d.o (RT#7208)
Aurelien Jarno [Mon, 9 Apr 2018 15:21:53 +0000 (17:21 +0200)]
Decommission zemlinsky.d.o (RT#7208)

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
6 years agozemlinsky.d.o is not a buildd anymore
Aurelien Jarno [Sun, 8 Apr 2018 11:42:29 +0000 (13:42 +0200)]
zemlinsky.d.o is not a buildd anymore

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
6 years agoCorrectly setup x86-bm-01 as a (py)buildd
Aurelien Jarno [Sun, 8 Apr 2018 10:01:06 +0000 (12:01 +0200)]
Correctly setup x86-bm-01 as a (py)buildd

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
6 years agoDo not check for bacula on pybuildds hosts
Aurelien Jarno [Sun, 8 Apr 2018 09:50:30 +0000 (11:50 +0200)]
Do not check for bacula on pybuildds hosts

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
6 years agoAdd a hostgroup and a check for pybuildd, switch x86-grnet-01 and zani to it
Aurelien Jarno [Sun, 8 Apr 2018 09:42:41 +0000 (11:42 +0200)]
Add a hostgroup and a check for pybuildd, switch x86-grnet-01 and zani to it

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
6 years agoRemove check for lvcreate on buildds
Aurelien Jarno [Sun, 8 Apr 2018 09:04:54 +0000 (11:04 +0200)]
Remove check for lvcreate on buildds

We now use tar based chroots and we are happy with that.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
6 years agomake checks/dsa-check-ipv6-default-gw +x
Peter Palfrader [Thu, 29 Mar 2018 08:06:24 +0000 (10:06 +0200)]
make checks/dsa-check-ipv6-default-gw +x

6 years agoeffectively stop monitoring godard's process count
Peter Palfrader [Thu, 15 Mar 2018 09:05:37 +0000 (10:05 +0100)]
effectively stop monitoring godard's process count

6 years agoextinfo stuff has been deprecated the way we do it. rip it out
Peter Palfrader [Tue, 13 Mar 2018 10:51:37 +0000 (11:51 +0100)]
extinfo stuff has been deprecated the way we do it.  rip it out

6 years agotry this again
Peter Palfrader [Tue, 13 Mar 2018 10:45:06 +0000 (11:45 +0100)]
try this again

6 years agohide the cd hosts from my web view
Peter Palfrader [Tue, 13 Mar 2018 10:32:24 +0000 (11:32 +0100)]
hide the cd hosts from my web view

6 years agoremove sgran
Peter Palfrader [Tue, 13 Mar 2018 10:30:32 +0000 (11:30 +0100)]
remove sgran

6 years agodsa-check-hpssacli: add --ignore-cache
Peter Palfrader [Tue, 13 Mar 2018 10:09:30 +0000 (11:09 +0100)]
dsa-check-hpssacli: add --ignore-cache

6 years agofix disk checks
Peter Palfrader [Tue, 13 Mar 2018 08:50:33 +0000 (09:50 +0100)]
fix disk checks

- snapshot farm changes:
  . make warning threshold lower than crit threshold
  . add lw09 and lw10 farm checks
- morgue on lw03:
  . make warning threshold lower than crit threshold
- qnap-big and -tiny on storace:
  . make warning threshold lower than crit threshold
  . raise limits for -tiny

6 years agostart 117
Peter Palfrader [Sun, 11 Mar 2018 08:06:22 +0000 (09:06 +0100)]
start 117

6 years agorelease 116
Peter Palfrader [Sun, 11 Mar 2018 08:05:50 +0000 (09:05 +0100)]
release 116

6 years agoCheck if we have a v6 gw on all hosts
Peter Palfrader [Sun, 11 Mar 2018 08:01:42 +0000 (09:01 +0100)]
Check if we have a v6 gw on all hosts

6 years agoAdd dsa-check-ipv6-default-gw
Peter Palfrader [Sun, 11 Mar 2018 08:04:30 +0000 (09:04 +0100)]
Add dsa-check-ipv6-default-gw

6 years agoRelease 115
Aurelien Jarno [Sat, 3 Mar 2018 10:11:11 +0000 (11:11 +0100)]
Release 115

6 years agoDrop DNS check for alioth zone
Julien Cristau [Tue, 27 Feb 2018 10:05:00 +0000 (11:05 +0100)]
Drop DNS check for alioth zone

6 years agoMonitor the freeradius service on vogler
Julien Cristau [Mon, 26 Feb 2018 20:56:31 +0000 (21:56 +0100)]
Monitor the freeradius service on vogler

6 years agoSet User-Agent in dsa-check-mirrorsync
Julien Cristau [Fri, 23 Feb 2018 15:32:40 +0000 (16:32 +0100)]
Set User-Agent in dsa-check-mirrorsync

6 years agoDecommission mirror-bytemark
Aurelien Jarno [Mon, 19 Feb 2018 19:02:14 +0000 (20:02 +0100)]
Decommission mirror-bytemark

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
6 years agoRun dsa-check-openmanage on schumann and wieck
Julien Cristau [Sun, 18 Feb 2018 18:58:22 +0000 (19:58 +0100)]
Run dsa-check-openmanage on schumann and wieck

6 years agoAdd check_openmanage
Julien Cristau [Sun, 18 Feb 2018 09:43:06 +0000 (10:43 +0100)]
Add check_openmanage

From http://folk.uio.no/trondham/software/check_openmanage.html

6 years agoganeti-csail uses the csail gateway, not the bytemark one
Aurelien Jarno [Sat, 17 Feb 2018 14:25:50 +0000 (15:25 +0100)]
ganeti-csail uses the csail gateway, not the bytemark one

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
6 years agoschumann has a /srv
Julien Cristau [Fri, 16 Feb 2018 08:00:33 +0000 (09:00 +0100)]
schumann has a /srv

6 years agoschumann has been repurposed into a security mirror
Aurelien Jarno [Thu, 15 Feb 2018 17:28:10 +0000 (18:28 +0100)]
schumann has been repurposed into a security mirror

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
6 years agophilp's default https vhost now returns 403
Julien Cristau [Wed, 14 Feb 2018 19:41:58 +0000 (20:41 +0100)]
philp's default https vhost now returns 403

6 years agoUse dsa-check-systemd-services instead of systemctl is-system-running
Peter Palfrader [Sun, 11 Feb 2018 10:19:38 +0000 (11:19 +0100)]
Use dsa-check-systemd-services instead of systemctl is-system-running

6 years agoAdd dsa-check-systemd-services
Peter Palfrader [Sun, 11 Feb 2018 10:17:28 +0000 (11:17 +0100)]
Add dsa-check-systemd-services

6 years agoAdd casulana to apache2-hosts
Peter Palfrader [Fri, 9 Feb 2018 19:50:41 +0000 (20:50 +0100)]
Add casulana to apache2-hosts

6 years agodsa-check-hpssacli: check PDs only when in HBA mode
Filippo Giunchedi [Tue, 6 Feb 2018 21:27:49 +0000 (22:27 +0100)]
dsa-check-hpssacli: check PDs only when in HBA mode

when using dsa-check-hpssacli with controllers in HBA mode the check
misbehaves on the "HBA Drives" line itself. The patch below fixes things
to check only PDs in the HBA case.

We've discovered this at Wikimedia Foundation where the check is also
deployed, see https://phabricator.wikimedia.org/T185216 for more
context.

6 years agoput godard into manyprocesses group
Peter Palfrader [Sun, 4 Feb 2018 23:46:30 +0000 (00:46 +0100)]
put godard into manyprocesses group

6 years agodsa-check-libs: do not report processes younger than 1h. This should get rid of...
Peter Palfrader [Sun, 4 Feb 2018 12:12:51 +0000 (13:12 +0100)]
dsa-check-libs: do not report processes younger than 1h.  This should get rid of the warnings when piuparts runs tests

6 years agorelease 112
Peter Palfrader [Sun, 4 Feb 2018 11:26:00 +0000 (12:26 +0100)]
release 112

6 years agodsa-check-libs: support ignoring young proccesses
Peter Palfrader [Sun, 4 Feb 2018 11:23:29 +0000 (12:23 +0100)]
dsa-check-libs: support ignoring young proccesses

6 years agodsa-check-libs: whitespace cleanup
Peter Palfrader [Sun, 4 Feb 2018 11:03:13 +0000 (12:03 +0100)]
dsa-check-libs: whitespace cleanup

6 years agoAdd lw10
Julien Cristau [Fri, 2 Feb 2018 18:46:04 +0000 (19:46 +0100)]
Add lw10

6 years agoDecommission fano and finzi
Aurelien Jarno [Fri, 2 Feb 2018 17:27:37 +0000 (18:27 +0100)]
Decommission fano and finzi

6 years agokill swap checks
Peter Palfrader [Fri, 2 Feb 2018 17:18:36 +0000 (18:18 +0100)]
kill swap checks