2 $ID = q$Id: check_udebug,v 1.3 2006/03/17 23:06:54 quanah Exp $;
4 # check_udebug -- Check AFS database servers using udebug for Nagios.
6 # Written by Russ Allbery <rra@stanford.edu>
7 # Copyright 2004 Board of Trustees, Leland Stanford Jr. University
9 # This program is free software; you may redistribute it and/or modify it
10 # under the same terms as Perl itself.
12 # Takes a hostname and a port number and checks the udebug output for that
13 # host and port. Reports an error if the recovery state is not 1f on the sync
14 # site (ensuring that it considers all of the other servers up-to-date) or if
15 # any of the servers don't believe there is a sync site.
17 ##############################################################################
19 ##############################################################################
21 # The default timeout in seconds (implemented by alarm) for udebug.
24 # The full path to udebug. Make sure that this is on local disk so that
25 # monitoring doesn't have an AFS dependency.
26 ($UDEBUG) = grep { -x $_ } qw(/usr/bin/udebug /usr/local/bin/udebug);
27 $UDEBUG ||= '/usr/bin/udebug';
29 ##############################################################################
30 # Modules and declarations
31 ##############################################################################
36 use vars qw($ID $TIMEOUT $UDEBUG);
38 use Getopt::Long qw(GetOptions);
40 ##############################################################################
42 ##############################################################################
44 # Parse command line options.
45 my ($help, $host, $port, $version);
46 Getopt::Long::config ('bundling', 'no_ignore_case');
47 GetOptions ('hostname|H=s' => \$host,
50 'timeout|t=i' => \$TIMEOUT,
51 'version|V' => \$version) or exit 3;
53 print "Feeding myself to perldoc, please wait....\n";
54 exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
56 my $version = join (' ', (split (' ', $ID))[1..3]);
58 $version =~ s/(\S+)$/($1)/;
63 if (@ARGV || !(defined ($host) && defined ($port))) {
64 warn "Usage: $0 [-hv] [-t <timeout>] -H <host> -p <port>\n";
70 print "UBIK CRITICAL - network timeout after $TIMEOUT seconds\n";
75 # Run udebug and parse the output. We're looking for three things: first,
76 # we're looking to see if this host claims to be the sync site. If so, check
77 # that recovery state is 1f. Otherwise, make sure that there's a defined sync
79 unless (open (UDEBUG, "$UDEBUG $host $port |")) {
80 warn "$0: cannot run udebug\n";
83 my ($issync, $recovery, $synchost);
85 $issync = 1 if /^I am sync site /;
86 $recovery = 1 if /^Recovery state 1f/;
87 $synchost = 1 if /^Sync host \d+(\.\d+){3} was set /;
91 print "UBIK CRITICAL - udebug failed\n";
96 if ($issync && !$recovery) {
97 print "UBIK CRITICAL - recovery state not 1f\n";
99 } elsif (!$issync && !$synchost) {
100 print "UBIK CRITICAL - no sync site\n";
107 ##############################################################################
109 ##############################################################################
113 check_udebug - Check AFS servers for blocked connections in Nagios
117 check_udebug [B<-hV>] [B<-t> I<timeout>] B<-H> I<host> B<-p> I<port>
121 B<check_udebug> is a Nagios plugin for checking AFS database servers to make
122 sure the Ubik replication between the database servers is running correctly.
123 B<udebug> is used to connect to the specified port, which should generally
124 be one of 7002 (ptserver), 7003 (vlserver), or 7004 (kaserver), on the
125 specified server. The resulting output is checked to make sure that the
126 recovery state is 1f if that server is the sync site, or that a sync site is
127 known if that server doesn't claim to be the sync site.
129 B<check_udebug> will always print out a single line of output. That line
130 will be C<UBIK OK> if everything is fine, or C<UBIK CRITICAL - > followed by
131 an error message otherwise.
137 =item B<-H> I<host>, B<--hostname>=I<host>
139 The AFS database server whose Ubik status B<check_udebug> should check.
140 This option is required.
142 =item B<-h>, B<--help>
144 Print out this documentation (which is done simply by feeding the script
147 =item B<-p> I<port>, B<--port>=I<port>
149 The port to connect to on the AFS database server. This should generally be
150 one of 7002 (ptserver), 7003 (vlserver), or 7004 (kaserver). This option is
153 =item B<-t> I<timeout>, B<--timeout>=I<timeout>
155 Change the timeout for the B<udebug> command. The default timeout is 60
158 =item B<-V>, B<--version>
160 Print out the version of B<check_udebug> and quit.
166 B<check_udebug> follows the standard Nagios exit status requirements. This
167 means that it will exit with status 0 if there are no problems or with
168 status 2 if there are critical problems. For other errors, such as invalid
169 syntax, B<check_udebug> will exit with status 3.
173 The standard B<-v> verbose Nagios plugin option is not supported. It should
174 print out the full B<udebug> output.
176 The usage message for invalid options and for the B<-h> option doesn't
177 conform to Nagios standards.
181 This script does not use the Nagios util library or any of the defaults that
182 it provides, which makes it somewhat deficient as a Nagios plugin. This is
183 intentional, though, since this script can be used with other monitoring
184 systems as well. It's not clear what a good solution to this would be.
188 The current version of this and other AFS monitoring plugins for Nagios are
189 available from the AFS monitoring tools page at
190 L<http://www.eyrie.org/~eagle/software/afs-monitor/>.
194 Russ Allbery <rra@stanford.edu>
196 =head1 COPYRIGHT AND LICENSE
198 Copyright 2004 Board of Trustees, Leland Stanford Jr. University.
200 This program is free software; you may redistribute it and/or modify it
201 under the same terms as Perl itself.