The input unicast packet counter has rolled over, but the error
counter hasn't; the right number is in fact 4,294,967,295 + 1,000
Nothing within SNMP will disambiguate these three cases. Furthermore,
if the problem is that the device is confused, and the device is
actually SNMP-compliant, there's nothing you can do about it
there is no way to reset the counters. The official position on this is
that the actual counter values aren't meaningful; it's the rate of
change of the counter values that is meaningful. This is great if
you're running network-management software that sits around monitoring
things continuously, but less than useful if you're just poking at
things once in a while.
Fortunately or unfortunately, compliance with this part of the SNMP
spec is just as good as compliance with any other piece. The devices
I've looked at fall into three classes. My favorite ones reset the
counter on reboot, which completely confuses continuous
network-management software, but works well for me. My second favorites
are the ones that actually implement the spec; it's somewhat annoying,
but I understand the reasoning, and at least I know what to expect. The
ones that truly annoy me are our Octel voicemail machines, which have
counters that not only fail to resetthey also fail to roll
over. This would be marginally less annoying if they counted up to the
full value of an SNMP counter (232 - 1), but instead they
only count up to 65,535 (216 - 1). As a result, most of the
more interesting ones pegged their counters within a week or so.
In practice, what I do about these counter issues is ignore them. I'm
not writing professional SNMP management tools; I'm doing
quick-and-dirty network debugging. I have a program to calculate error
rates, and then I look at them. If they're insane, I apply obvious
human-mediated debugging techniques to sort out the three possible
reasons. (For instance, does the network work at all? If so, then there
is not a 500% error rate. Does the device return equally bizarre
information if queried otherwise? If so, then perhaps it's confused,
and rebooting it will improve the world.)
With that in mind, let's keep working through MIB-II. We've pretty
thoroughly mined the system; the remaining parts of MIB-II are:
interfaces
at
ip
icmp
tcp
udp
egp
transmission
snmp
First, let's decide to ignore some of these forever. "at" is the
address translation group, which is officially deprecated because its
meaning is entirely dependent on the protocols your device happens to
be speaking. If TCP/IP is the device's favorite protocol, the "at"
group is basically an exceedingly annoying way of representing the arp
cache.
The "transmission" group has data about the transmission media
underlying the interfaces, if it has anything at all, which doesn't
happen all that often. You probably don't care; I certainly don't.
The "egp" and "icmp" groups have information about their respective
protocols, if the device implements them. Once again, this is all very
well, but they're just not very interesting protocols for most
purposes. The "snmp" group is one of the best examples, outside of
particle physics, of the Heisenberg effect, whereby observing something
changes the value observed. Of course if you send an SNMP get
command to get the value of snmp.snmpInPkts, which is the
number of SNMP packets received, you increment the counter. Aside from
this minor and recondite pleasure, the snmp group doesn't have many
applications if you track how many requests you make to
a machine, you can see if anybody else is playing with SNMP, but
tracking them down is a separate and thornier problem.
That leaves us with the apparently useful "interfaces," "ip," "tcp,"
and "udp" groups. Here's a walk through part of the interfaces group,
to illustrate how SNMP tables work:
interfaces.ifNumber.0 = 2
interfaces.ifTable.ifEntry.ifIndex.1 = 1
interfaces.ifTable.ifEntry.ifIndex.2 = 2
interfaces.ifTable.ifEntry.ifDescr.1 = Silicon Graphics ec
Ethernet controller
interfaces.ifTable.ifEntry.ifDescr.2 = Silicon Graphics lo
Loopback interface
interfaces.ifTable.ifEntry.ifType.1 =
ethernetCsmacd(6)
interfaces.ifTable.ifEntry.ifType.2 =
softwareLoopback(24)
interfaces.ifTable.ifEntry.ifMtu.1 = 1500
interfaces.ifTable.ifEntry.ifMtu.2 = 8304
interfaces.ifTable.ifEntry.ifSpeed.1 = Gauge:
10000000
interfaces.ifTable.ifEntry.ifSpeed.2 = Gauge:
200000000
interfaces.ifTable.ifEntry.ifPhysAddress.1 =
8:0:69:2:f6:ff
interfaces.ifTable.ifEntry.ifPhysAddress.2 =
interfaces.ifTable.ifEntry.ifAdminStatus.1 = up(1)
interfaces.ifTable.ifEntry.ifAdminStatus.2 = up(1)
interfaces.ifTable.ifEntry.ifOperStatus.1 = up(1)
interfaces.ifTable.ifEntry.ifOperStatus.2 = up(1)
interfaces.ifTable.ifEntry.ifInOctets.1 = 2081101543
interfaces.ifTable.ifEntry.ifInOctets.2 = 31835092
interfaces.ifTable.ifEntry.ifInUcastPkts.1 = 3224161
interfaces.ifTable.ifEntry.ifInUcastPkts.2 = 500898
interfaces.ifTable.ifEntry.ifInNUcastPkts.1 = 926910
interfaces.ifTable.ifEntry.ifInNUcastPkts.2 = 0
interfaces.ifNumber is a familiar, single-instance variable
that tells how many interfaces the machine has. You were probably
assuming that we'd been using "0" because SNMP counts starting at 0. As
you can see, this is false. Actually, if there's anything to count, it
is not allowed to start below 1. (In most cases, it will start at 1,
but trusting in anything is unwise with SNMP.) And it gets worse
check this out:
interfaces.ifNumber.0 = 15
interfaces.ifTable.ifEntry.ifDescr.1 = Serial0/0
interfaces.ifTable.ifEntry.ifDescr.2 = Serial0/1
interfaces.ifTable.ifEntry.ifDescr.3 = Serial0/2
interfaces.ifTable.ifEntry.ifDescr.4 = Serial0/3
interfaces.ifTable.ifEntry.ifDescr.5 = Serial0/4
interfaces.ifTable.ifEntry.ifDescr.6 = Serial0/5
interfaces.ifTable.ifEntry.ifDescr.7 = Serial0/6
interfaces.ifTable.ifEntry.ifDescr.8 = Serial0/7
interfaces.ifTable.ifEntry.ifDescr.9 = Ethernet1/0
interfaces.ifTable.ifEntry.ifDescr.10 = Ethernet1/1
interfaces.ifTable.ifEntry.ifDescr.11 = Ethernet1/2
interfaces.ifTable.ifEntry.ifDescr.12 = Ethernet1/3
interfaces.ifTable.ifEntry.ifDescr.13 =
FastEthernet2/0
interfaces.ifTable.ifEntry.ifDescr.14 =
FastEthernet2/1
interfaces.ifTable.ifEntry.ifDescr.23 = Serial0/7.110
That's right, it has 15 interfaces, numbered 1 through 14, and 23.
That's OK. It's allowed to do that. Of course, if you're trying to loop
through all the interfaces, this makes life unpleasant. Fortunately,
SNMP allows you to do a "get next." A get next on
interfaces.ifTable.ifEntry.ifDescr.0 (which, you will note,
doesn't exist, and is guaranteed not to) returns
interfaces.ifTable.ifEntry.ifDescr.1 and its value. If you
have a handy indicator like interfaces.ifNumber, you can "get
next" the appropriate number of times. Otherwise, you may just have to
keep going until the next object is either an error or something in
another part of the tree.
So here's a version of a program I've actually used to debug network
problems:
[ASCII text version of following program code]
#!/usr/bin/perl5
#
# tcpprobs
# Elizabeth D. Zwicky
# zwicky@sgi.com
# July 1998
use CGI qw(:all);
use SNMP;
# This turns on formatted printing of variables
$SNMP::use_sprint_value = 1;
$ORANGE_THRES = 5;
$RED_THRES = 20;
print header;
print start_html(-title=>"TCP/IP error rates",
-bgcolor=>"ffffff");
print h1("TCP/IP error rates");
# Up to you to figure out how to get this set as a
parameter;
# the elegant way is to write up a form, but you could always
just
# hand-type it as part of the URL, as in
# http://yourhost/tcpprobs?hostname=hosttocheck
$hostname = param('hostname');
if ($sess = new
SNMP::Session(DestHost=>"$hostname")){
# First we pull a bunch of nice, straightforward
single-instance
# variables.
$tcpout = $sess->get(["tcp.tcpOutSegs",
"0"]);
$tcpretrans = $sess->get(["tcp.tcpRetransSegs",
"0"]);
$ipin = $sess->get(["ip.ipInReceives",
"0"]);
$ipinheader = $sess->get(["ip.ipInHdrErrors",
"0"])
$ipinaddr = $sess->get(["ip.ipInAddrErrors",
"0"]);
$ipdiscard = $sess->get(["ip.ipInDiscards",
"0"]);
print h3("$hostname");
print p("TCP: $tcpout packets out, $tcpretrans
(".
&ppercent($tcpretrans, $tcpout).
" percent) TCP retransmission errors
<br>"
);
print p("IP: $ipin packets received, $ipinhdr
(".
&ppercent($ipinhdr, $ipin).
" percent) header errors, $ipinaddr (".
&ppercent($ipinaddr, $ipin) .
" percent) address errors"
);
# And then we wander off into manipulating tables and
multiple
# instances...
# This is the number of interfaces on the
machine
$interfaces = $sess->get(["interfaces.ifNumber",
"0"]);
print "<table border = 2>\n";
print TR (
th('Interface'), th('Adm. Stat.'), th('Op.
Stat.'),
th(' '),
th('Input Packets'), th('Input Errors'), th('Input
Discards'),
th(' '),
th('Output Packets'), th('Output Errors'),
th('Output Discards')
);
# And now we loop
foreach $index (0..($interfaces - 1)){
$interface =
$sess ->
getnext(["interfaces.ifTable.ifEntry.ifIndex"
,
$index]);
$descr =
$sess->
get(["interfaces.ifTable.ifEntry.ifDescr",
"$interface"]);
$admin =
$sess->
get(["interfaces.ifTable.ifEntry.ifAdminStatu
s",
"$interface"]);
$oper =
$sess->
get(["interfaces.ifTable.ifEntry.ifOperStatus
",
"$interface"]);
$unknown =
$sess->
get(["interfaces.ifTable.ifEntry.ifInUnknownP
rotos",
"$interface"]);
$input =
$sess->
get(["interfaces.ifTable.ifEntry.ifInNUcastPk
ts",
"$interface"]);
$input +=
$sess->
get(["interfaces.ifTable.ifEntry.ifInUcastPkt
s",
"$interface"]);
$inerrs =
$sess->
get(["interfaces.ifTable.ifEntry.ifInErrors",
"$interface"]);
$indisc =
$sess->
get(["interfaces.ifTable.ifEntry.ifInDiscards
",
"$interface"]);
$output =
$sess->
get(["interfaces.ifTable.ifEntry.ifOutNUcastP
kts",
"$interface"]);
$output +=
$sess->
get(["interfaces.ifTable.ifEntry.ifOutUcastPk
ts",
"$interface"]);
$outdisc =
$sess->
get(["interfaces.ifTable.ifEntry.ifOutDiscard
s",
"$interface"]);
$outerrors =
$sess->
get(["interfaces.ifTable.ifEntry.ifOutErrors"
,
"$interface"]);
print TR (
td($descr), td($admin), td($oper),
td(' '),
td($input),
td("$inerrs (" . &ppercent($inerrs, $input) .
")"),
td("$indisc (" . &ppercent($indisc, $input) .
")"),
td(' '),
td($output),
td("$outerrs (" . &ppercent($outerrs, $output)
. ")"),
td("$outdisc (" . &ppercent($outdisc, $output)
. ")"),
);
}
print "</table>\n";
}
else {
print p(b("Could not bind to $hostname: $!"));
}
print end_html;
sub ppercent {
my($num) = $_[0];
my($denom) = $_[1];
if ($denom <= 0 ){
return 0;
}
else {
my($percent) = ($num * 100)/
$denom;
if ($percent > $RED_THRES){
return sprintf("<font
color=red>%3.2f%%</font>", $percent);
}
elsif ($percent > $ORANGE_THRES){
return sprintf("<font
color=orange>%3.2f%%</font>", $percent);
}
else {
return sprintf("%3.2f%%",
$percent);
}
}
}
There are a few tricks here that we haven't already discussed. The "ip"
group gives me separate numbers for "UcastPkts" (unicast packets) and
"NUcastPkts" (non-unicast packets, i.e., multicasts and broadcasts).
For my purposes, this is irrelevant, so I add them together.
There's also this unpleasant-looking result:
TCP: Wrong Type (should be Counter): NULL packets out, Wrong
Type (should be Counter): NULL (0 percent) TCP retransmission
errors
IP: Wrong Type (should be Counter): NULL packets received, (0
percent)
header errors, Wrong Type (should be Counter): NULL (0 percent)
address errors
That's a UNIX machine that doesn't keep these statistics in its kernel
and therefore doesn't feed them to its SNMP agent, running into a
combination of beautiful error handling (the library's) and completely
laissez-faire error nonhandling (mine). You could make the output more
beautiful, but you can't get blood out of a stone, or TCP
retransmission statistics out of a machine running IRIX 5.3's default
SNMP agent.
You may wonder how I picked the precise variables I show here. It's
clear even from the excerpts I've shown that these are not all the
variables that are available to me. This is more or less pure
empiricism; I started with a program that displayed pretty nearly
everything and got rid of all the ones that I never actually needed,
until the remaining information fit well on a page. Depending on your
point of view, this is either science at its best or hackery at its
worst.
Next: Exploring MIBs on your own; we discover device-specific MIBs.