2 $ID = q$Id: check_bos,v 1.7 2006/03/17 23:06:54 quanah Exp $;
4 # check_bos -- Monitor AFS bos output for problems in Nagios.
6 # Written by Russ Allbery <rra@stanford.edu>
7 # Based on an earlier script by Neil Crellin <neilc@stanford.edu>
8 # Copyright 2003, 2004 Board of Trustees, Leland Stanford Jr. University
10 # This program is free software; you may redistribute it and/or modify it
11 # under the same terms as Perl itself.
13 # Given an AFS server (file or VLDB), runs bos status on each one. Checks to
14 # see if there is a communication failure, and also checks to see if anything
15 # in the output looks unusual or wrong. If either of these conditions are
16 # true, print that information to STDOUT. Suitable for being run inside
19 ##############################################################################
21 ##############################################################################
23 # The full path to bos. Make sure that this is on local disk so that
24 # monitoring doesn't have an AFS dependency.
25 ($BOS) = grep { -x $_ } qw(/usr/bin/bos /usr/local/bin/bos);
26 $BOS ||= '/usr/bin/bos';
28 # The default timeout in seconds (implemented by alarm) for rxdebug.
31 # The list of regular expressions matching expected output. You may need to
32 # customize this for what you're running at your site. Any output from bos
33 # that doesn't match one of these regular expressions will throw a critical
37 qr/^Instance\ \S+,\ \(type\ is\ \S+\)(\ has\ core\ file,)?
38 \ currently\ running\ normally\.$/x,
39 qr/^\s*Auxiliary status is: file server running\.$/,
40 qr/^\s*Process last started at /,
41 qr/^\s*Last exit at /,
42 qr/^\s*Last error exit at /,
43 qr/^\s*Command \d+ is /
46 ##############################################################################
47 # Modules and declarations
48 ##############################################################################
53 use vars qw($BOS $ID @OKAY $TIMEOUT);
55 use Getopt::Long qw(GetOptions);
57 ##############################################################################
59 ##############################################################################
61 # Parse command line options.
62 my ($help, $host, $version);
63 Getopt::Long::config ('bundling', 'no_ignore_case');
64 GetOptions ('hostname|H=s' => \$host,
66 'timeout|t=i' => \$TIMEOUT,
67 'version|V' => \$version) or exit 3;
69 print "Feeding myself to perldoc, please wait....\n";
70 exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
72 my $version = join (' ', (split (' ', $ID))[1..3]);
74 $version =~ s/(\S+)$/($1)/;
80 print "Usage: $0 [-hv] [-t <timeout>] -H <host>\n";
81 warn "Usage: $0 [-hv] [-t <timeout>] -H <host>\n";
87 print "BOS CRITICAL - network timeout after $TIMEOUT seconds\n";
92 # Collect the bos output into a variable.
93 unless (open (BOS, "$BOS status $host -noauth -long 2>&1 |")) {
94 print "BOS UNKNOWN - cannot run bos\n";
100 # Make sure that bos was successful. Note that it generally does return
101 # success even if it can't contact the bos server.
103 print "BOS CRITICAL - bos status failed\n";
107 # Scan the output. If we see anything that we don't expect, immediately
108 # report it as a fatal error.
109 for my $line (@bos) {
111 for my $regex (@OKAY) {
112 if ($line =~ /$regex/) {
120 print "BOS CRITICAL - $line\n";
127 ##############################################################################
129 ##############################################################################
133 check_bos - Monitor AFS bos output for problems in Nagios
137 check_bos [B<-hV>] [B<-t> I<timeout>] B<-H> I<host>
141 B<check_bos> is a Nagios plugin for querying the AFS bosserver for process
142 status and reporting an alert if there are any unexpected lines in the bos
143 output. The acceptable lines of output from B<bos> are configured at the
144 top of this script; they should be generally suitable for most sites, but
145 may require some customization.
147 B<check_bos> will always print out a single line of output. If there is a
148 line that isn't matched by any regexes identifying acceptable lines, it will
149 output the first non-matching line prefixed by C<BOS CRITICAL>. Otherwise,
150 it will output B<BOS OK>. Note that this monitoring may not catch such
151 things as a service being constantly restarted if it happens to be up and
152 running normally each time the probe runs; it doesn't pay any attention to
153 the last start time, the last error exit status, the presence of core files,
154 and the like. It mostly just looks for the "running normally" part of the
155 B<bos> output and makes sure the auxilliary status is also "running
156 normally" for a file server process.
162 =item B<-H> I<host>, B<--hostname>=I<host>
164 The AFS server whose B<bos> status B<check_bos> should check. This option
167 =item B<-h>, B<--help>
169 Print out this documentation (which is done simply by feeding the script
172 =item B<-t> I<timeout>, B<--timeout>=I<timeout>
174 Change the timeout for the B<bos> command. The default timeout is 10
177 =item B<-V>, B<--version>
179 Print out the version of B<check_bos> and quit.
185 B<check_bos> follows the standard Nagios exit status requirements. This
186 means that it will exit with status 0 if there are no problems or with
187 status 2 if there is a problem detected. For other errors, such as invalid
188 syntax, B<check_bos> will exit with status 3.
192 The standard B<-v> verbose Nagios plugin option is not supported. It should
193 display the complete bos status output.
195 The usage message for invalid options and for the B<-h> option doesn't
196 conform to Nagios standards.
200 This script does not use the Nagios util library or any of the defaults that
201 it provides, which makes it somewhat deficient as a Nagios plugin. This is
202 intentional, though, since this script can be used with other monitoring
203 systems as well. It's not clear what a good solution to this would be.
207 The current version of this and other AFS monitoring plugins for Nagios are
208 available from the AFS monitoring tools page at
209 L<http://www.eyrie.org/~eagle/software/afs-monitor/>.
213 The original idea behind this script was from Neil Crellin. Russ Allbery
214 <rra@stanford.edu> updated it to work with Nagios and stripped out some
215 rather neat but now unnecessary code to look for any changes in the bos
216 output, instead just scanning it for acceptable lines.
218 =head1 COPYRIGHT AND LICENSE
220 Copyright 2003, 2004 Board of Trustees, Leland Stanford Jr. University.
222 This program is free software; you may redistribute it and/or modify it
223 under the same terms as Perl itself.