2 $ID = q$Id: check_rxdebug,v 1.11 2006/03/17 23:06:54 quanah Exp $;
4 # check_rxdebug -- Nagios AFS server check for waiting connections.
6 # Written by Quanah Gibson-Mount based on work by Neil Crellin
7 # Updated by Russ Allbery <rra@stanford.edu>
8 # Copyright 2003, 2004, 2005 Board of Trustees, Leland Stanford Jr. University
10 # This program is free software; you may redistribute it and/or modify it
11 # under the same terms as Perl itself.
13 # Expects a file server with the -H option and runs rxdebug against that file
14 # server, looking for any connections that are waiting for a thread. Exits
15 # with status 1 if there are more than two connections in that state (a
16 # warning) and with status 2 if there are more than eight connections in that
17 # state. The thresholds can be overridden from the command line.
19 ##############################################################################
21 ##############################################################################
23 # The default count of blocked connections at which to warn or send a critical
24 # alert. These can be overridden with the -w and -c command-line options.
28 # The default timeout in seconds (implemented by alarm) for rxdebug.
31 # The full path to rxdebug. Make sure that this is on local disk so that
32 # monitoring doesn't have an AFS dependency.
33 ($RXDEBUG) = grep { -x $_ } qw(/usr/bin/rxdebug /usr/local/bin/rxdebug);
34 $RXDEBUG ||= '/usr/bin/rxdebug';
36 ##############################################################################
37 # Modules and declarations
38 ##############################################################################
43 use vars qw($CRITICAL $ID $RXDEBUG $TIMEOUT $WARNINGS);
45 use Getopt::Long qw(GetOptions);
47 ##############################################################################
49 ##############################################################################
51 # Parse command line options.
52 my ($help, $host, $version);
53 Getopt::Long::config ('bundling', 'no_ignore_case');
54 GetOptions ('critical|c=i' => \$CRITICAL,
55 'hostname|H=s' => \$host,
57 'timeout|t=i' => \$TIMEOUT,
58 'version|V' => \$version,
59 'warning|w=i' => \$WARNINGS) or exit 3;
61 print "Feeding myself to perldoc, please wait....\n";
62 exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
64 my $version = join (' ', (split (' ', $ID))[1..3]);
66 $version =~ s/(\S+)$/($1)/;
72 warn "Usage: $0 [-hv] [-c <level>] [-w <level>] -H <host>\n";
75 if ($WARNINGS > $CRITICAL) {
76 warn "$0: warning level $WARNINGS greater than critical level $CRITICAL\n";
82 print "AFS CRITICAL - network timeout after $TIMEOUT seconds\n";
87 # Run rxdebug and parse the output, counting the number of waiting for process
88 # connections that we have.
89 unless (open (RXDEBUG, "$RXDEBUG $host -noconn |")) {
90 warn "$0: cannot run rxdebug\n";
95 if (/^(\d+) calls waiting for a thread/) {
102 print "AFS CRITICAL - cannot contact server\n";
105 unless (defined $blocked) {
106 print "AFS CRITICAL - cannot parse rxdebug output\n";
110 # Check the connection count against our limits and make sure that it's okay.
111 if ($blocked >= $CRITICAL) {
112 print "AFS CRITICAL - $blocked blocked connections\n";
114 } elsif ($blocked >= $WARNINGS) {
115 print "AFS WARNING - $blocked blocked connections\n";
118 print "AFS OK - $blocked blocked connections\n";
122 ##############################################################################
124 ##############################################################################
128 check_rxdebug - Check AFS servers for blocked connections in Nagios
132 check_rxdebug [B<-hV>] [B<-c> I<threshold>] [B<-w> I<threshold>]
133 [B<-t> I<timeout>] B<-H> I<host>
137 B<check_rxdebug> is a Nagios plugin for checking AFS file servers to see if
138 there are client connections waiting for a free thread. If there are more
139 than a few of these, AFS performance tends to be very slow; this is a fairly
140 reliable way to catch overloaded file servers. By default, B<check_rxdebug>
141 returns a critical error if there are more than eight connections waiting
142 for a free thread and a warning if there are more than two. These
143 thresholds can be changed with the B<-c> and B<-w> options.
145 B<check_rxdebug> will always print out a single line of output including the
146 number of blocked connections, displaying whether this is critical, a
153 =item B<-c> I<threshold>, B<--critical>=I<threshold>
155 Change the critical blocked connection count threshold to I<threshold>,
156 which should be an integer. The default is 8.
158 =item B<-H> I<host>, B<--hostname>=I<host>
160 The AFS file server whose connections B<check_rxdebug> should check. This
163 =item B<-h>, B<--help>
165 Print out this documentation (which is done simply by feeding the script
168 =item B<-t> I<timeout>, B<--timeout>=I<timeout>
170 Change the timeout for the B<rxdebug> command. The default timeout is 60
173 =item B<-V>, B<--version>
175 Print out the version of B<check_rxdebug> and quit.
177 =item B<-w> I<threshold>, B<--warning>=I<threshold>
179 Change the warning blocked connection threshold to I<threshold>, which
180 should be an integer. The default is 2.
186 B<check_rxdebug> follows the standard Nagios exit status requirements. This
187 means that it will exit with status 0 if there are no problems, with status
188 1 if there is a warning, and with status 2 if there is a critical problem.
189 For other errors, such as invalid syntax, B<check_rxdebug> will exit with
194 The standard B<-v> verbose Nagios plugin option is not supported, although
195 it's not entirely clear what it would add.
197 The usage message for invalid options and for the B<-h> option doesn't
198 conform to Nagios standards.
202 This script does not use the Nagios util library or any of the defaults that
203 it provides, which makes it somewhat deficient as a Nagios plugin. This is
204 intentional, though, since this script can be used with other monitoring
205 systems as well. It's not clear what a good solution to this would be.
209 The current version of this and other AFS monitoring plugins for Nagios are
210 available from the AFS monitoring tools page at
211 L<http://www.eyrie.org/~eagle/software/afs-monitor/>.
215 The original idea behind this script was from Neil Crellin. It was updated
216 by Quanah Gibson-Mount to work with Nagios, and then further updated by Russ
217 Allbery <rra@stanford.edu> to support more standard options and to use a
218 more uniform coding style.
220 =head1 COPYRIGHT AND LICENSE
222 Copyright 2003, 2004, 2005 Board of Trustees, Leland Stanford Jr. University.
224 This program is free software; you may redistribute it and/or modify it
225 under the same terms as Perl itself.