System Monitoring with Xymon/Other Docs/HOWTO/Devmon SNMP

Devmon SNMP Hobbit Setup HOWTO
This howto is to explore processes of adding SNMP manager querying ability to a hobbit server. The assumptions and procedures are from http://www.techagent.com/devmon_snmp_hobbit_setup.htm with some minor editing.

Assumptions

 * Hobbit is installed and working.
 * Devmon in installed with no apparent problems.
 * You have a target device that has snmp enabled.
 * You know the IP address of the target device.
 * You know the read only (query) community string of the target device.
 * No firewall is installed on the Linux machine or between your Linux machine and the snmp target device.
 * You have net-snmp-utils (or equivalent) installed on the Linux machine.
 * This procedure use a Windows XP machine as an example SNMP device. Instructions for installing/configuring an SNMP agent for a variety of operating systems are available.

Ping the target snmp device
Do this from the hobbit server, of course. NOTE: In the examples below the snmp target device will be called winserver. [root@hob tmp]# ping winserver PING winserver (192.168.100.105) 56(84) bytes of data. 64 bytes from winserver (192.168.100.105): icmp_seq=0 ttl=124 time=48.7 ms This tells us that the device is up and is reachable across the network.

If you can't ping the snmp device, you must troubleshoot this problem before going any further. Remember sometimes devices are configured to not respond to a ping.

Try to read the sysDescr from the snmp device
This will tell us that we can reach the snmp device via snmp, we are using the correct read only (query) community string and we will use the returned information in future steps. [root@hob tmp]# snmpget -v2c -c public winserver 1.3.6.1.2.1.1.1.0 SNMPv2-MIB::sysDescr.0 = STRING: Hardware: x86 Family 15 Model 3 Stepping 4 AT/AT COMPATIBLE - Software: Windows 2000 Version 5.0 (Build 2195 Multiprocessor Free) If you were successful, Hooray! This is a good thing:
 * We can talk snmp to the device.
 * We are using the correct community string.
 * We are using the correct snmp version.
 * We now know the sysDescr for the device, Devmon setup will need to know this.

If you weren't successful then you must troubleshoot this step before going any further. Perhaps try snmpget with the -d ( debug) option to gather more information and maybe try tcpdump to listen for any communication between the linux server and the snmp target device.

Add the target device to your hobbit setup
Add a line for the device to your bb-hosts file. See the Devmon documentation for more information on the options you can add here. 192.168.100.105 winserver # DEVMON:cid(public)

Consider how Devmon operates
This was written by eschwim (the project admin). The way Devmon works is fairly simple.


 * 1) An outside process (most likely devmon running with the --readbbhosts flag) updates the Devmon database from the Hobbit or BigBrother bb-hosts file.
 * 2) * In a single-node installation, the Devmon database is stored in the hosts.db file.
 * 3) * In a multi-node installation, it is kept in a MySQL database.
 * 4) Devmon reads its templates.
 * 5) * A single-node installation reads the templates from disk at the beginning of every polling cycle.
 * 6) * The multi-node version reads the templates from the database, but only if they have been updated/changed since the last time it read them.
 * 7) Devmon does SNMP queries on all of the devices in its database. SNMP queries are optimized so that if the same SNMP OID is specified in multiple tests for a device, it is only queried once.
 * 8) Devmon applies template logic against the returned SNMP data. This involves doing transforms, applying thresholds, and then finally rendering the message to be sent to the display server.
 * 9) Devmon sends the rendered messages to the Display server.
 * 10) Devmon sleeps for any remaining time in the poll cycle.
 * 11) Usually, return to step 1. If the interval at which your external process updates your devmon database is not be the same as your poll interval, Devmon might actually go to step 2 instead.

Create a template for the device
We now have a basic understanding of Devmon. We can talk to our target device via snmp, we know the sysDescr, and we have the device entered into our bb-hosts file. Since the device (a Windows 2000 server) is not one of the supplied templates, the next step is to create a new template using a preexisting one as a guide.

[root@hob win2000server]# more specs vendor : Microsoft model : Win2000 snmpver : 2 sysdesc : Hardware: x86 Family 15 Model 3 Stepping 4 AT/AT COMPATIBLE - Software : Windows 2000 Version 5.0 In this example, the full sysDescr was used, but you can use something shorter like "Software : Windows 2000 Version 5.0" or even "Windows 2000" as long as it's matched. It depends on how granular you want to be (e.g. treat x86 vs x64 devices differently).
 * 1) In the templates directory, make a new directory for the new device.
 * 2) Copy a specs file from another device directory to use as a guide.
 * 3) Edit the specs file:
 * 4) * Change the Vendor to a vendor name you want to use.
 * 5) * Change the model to a model number you want to use.
 * 6) * Change the snmpver to the snmpver of the device (you confirmed the version in an earlier step).
 * 7) * Most importantly, change the sysdesc to match the sysDescr you received from the device earlier.

Don't yet worry about a test directory like all the other devices have. We are just trying to make sure Devmon is going to work with hobbit.

Confirm template matched
Run devmon with the --readbbhosts -vvv flags. When we do this the devmon process will read the bb-hosts file looking for the DEVMON tag. When it finds one, it will query the device for sysDescr and try to find a matching device in the templates directory's specs files. If a match is found then the device is added to the Devmon hosts.db. If devmon doesn't find a match then the device will be ignored. [root@hob devmon]# ./devmon --readbbhosts -vvv [07-01-31@04:42:07] SNMP querying all hosts in bb-hosts file, please wait... [07-01-31@04:42:07] Querying pre-existing hosts [07-01-31@04:42:08] Querying new hosts /w custom cids using snmp v2 [07-01-31@04:42:08] Discovered winserver as a Microsoft Win2000

If the discovery was unsuccessful (i.e. no Discovered line), you need to double check the sysDescr against the sysdesc in the specs file. Try copy/paste to avoid any transcription errors.

Decide on a test for the device
We now know devmon has successfully identified the target device and has entered it into the hosts.db. Our next step is to add a test. You should pick a value from the MIB for the device that you want hobbit to check. On my test device I have loaded the free standard edition snmp package from http://www.snmp-informant.com/. This package allows me to check the free disk space on the machine using this oid (object identifier) .1.3.6.1.4.1.9600.1.1.1.1.5.2.67.58. So my test will be a check of free disk space. Let's test with snmpget first to confirm that this is a good oid: [root@hob win2000server]# snmpget -v2c -c public winserver .1.3.6.1.4.1.9600.1.1.1.1.5.2.67.58 SNMPv2-SMI::enterprises.9600.1.1.1.1.5.2.67.58 = Gauge32: 71 As you can see a value was successfully retrieved, so this is going to be the basis of the test. Create its directory. [root@hob win2000server]# mkdir disk [root@hob win2000server]# cd disk [root@hob disk]#

Create a test: files
Now is the time we concern ourselves with the other files we skipped in the new directory we created earlier. There are 5 files that are used, the only ones required are oids, message, thresholds. The others are useful and you will probably end up using them, but our goal is just to get going.

root@hob disk]# more oids win2000model :                .1.3.6.1.2.1.1.1.0                  :leaf win2000uptime :               .1.3.6.1.2.1.1.3.0                  :leaf win2000freespace :          .1.3.6.1.4.1.9600.1.1.1.1.5.2.67.58   :leaf root@hob disk]#
 * oids : This file contains the OIDs of the tests to be performed. Notice I just made up a name in the first field, this name will be used in the other files also, so you want to make it unique. The documentation that comes with Devmon is very good for these files, so I won't be going into great detail.

[root@hob disk]# more message {win2000model.errors} {win2000uptime.errors} {win2000freespace.errors}
 * message: This is used to construct the message that will go to the hobbit server. It uses the names you made up in the oids file. The win2000uptime_m is a variable created in the transforms file.  Again the documentation is fine for this file.

Disk Free Space

Model {win2000model} System up time {win2000uptime_m} minutes Free Space {win2000freespace}% [root@hob disk]# more thresholds win2000freespace :    red                : <=10 : free space is very low win2000freespace :    yellow             : <=20 : free space is low [root@hob disk]# Optional Files:
 * thresholds: This file sets your desired thresholds for the value you are watching. This is very self-explanatory. See the documentation.

[root@hob disk]# more transforms win2000uptime_s :      MATH     : {win2000uptime} / 100 win2000uptime_m :      MATH     : {win2000uptime} / 100 / 60 win2000uptime_h :      MATH     : {win2000uptime} / 100 / 60 / 60 [root@hob disk]#
 * transforms: This file gives you the ability to do some magic with the values retrieved from the snmp target. I've simply used it to divide the uptime to get minutes, but it seems very powerful. See the documentation.


 * exceptions: This file is currently outside the scope of this howto. Sorry! See the documentation.

Test!
Run devmon with the -f -p -vvvvvvvvv options, this will: [root@hob disk]# ../../../devmon -f -p -vvvvvvvvvvvvvvvvvvvvvvvvvvvvvv [07-01-31@06:45:29] Nodename autodetected as hob [07-01-31@06:45:29] ---Initilizing devmon... [07-01-31@06:45:29] Verbosity level: 30 [07-01-31@06:45:29] Logging to /var/log/devmon.log [07-01-31@06:45:29] Node 0 reporting to localhost [07-01-31@06:45:29] Running under process id: 29301 [07-01-31@06:45:29] Entering poll loop [07-01-31@06:45:30] Starting snmp queries [07-01-31@06:45:30] Querying winserver for tests disk [07-01-31@06:45:31] Performing test logic [07-01-31@06:45:31] Done with test logic [07-01-31@06:45:31] Sending messages to display server status winserver.disk green Wed Jan 31 06:45:31 2007
 * keep devmon in the foreground
 * not send the message to the hobbit server
 * and be very verbose

Disk Free Space

Model Hardware: x86 Family 15 Model 3 Stepping 4 AT/AT COMPATIBLE - Software: Windows 2000 Version 5.0 (Build 2195 Multiprocessor Free) System up time 1293.99 minutes Free Space 71%

Devmon version 0.2.2 running on hob status hob.dm green Wed Jan 31 06:45:31 2007

devmon, version 0.2.2

Node name: hob Node number: 0 Process ID: 29301

Cycle time: 60 Dead time: 180

Polled devices: 1 Polled tests: 1 Avg tests/node: n/a
 * 1) clear msgs: 0

SNMP test time: 1 Test logic time: 0 BB msg xfer time: 0 This poll period: 1

Avg poll time: wait

[07-01-31@06:45:31] Sleeping for 59 seconds. Success, devmon is working, the snmp device is responding. Now just follow the documentation to setup devmon in cron.

A Successful match
bash-3.00$ cat Solaris-5.10/specs vendor : Solaris model  : 5.10 snmpver : 2 sysdesc : 5.10 bash-3.00$
 * The sysdesc is "SunOS snmpsolaris10 5.10 Generic_118833-36 sun4u"
 * The devmon spec files for Solaris 10 created from compaq-server.
 * "cp -rp templates/compaq-server templates/Solaris-5.10"
 * Note: sysdesc filed in Solaris-5.10/specs file need to be 5.10 or SunOS. otherwise devmon won't be able to match it.
 * Note: raid directory need to be exist, otherwise match will failed.
 * modify Solaris-5.10/spec like following

bash-3.00$ ./devmon --readbbhosts -vvvvvvvv --debug [07-08-12@06:41:26] Saw 9 vendors, 25 models, 25 sysdescs & 75 templates [07-08-12@06:41:26] SNMP querying all hosts in bb-hosts file, please wait... [07-08-12@06:41:26] Querying pre-existing hosts [07-08-12@06:41:26] DEBUG SNMP: Dethawing data for snmpsolaris10 [07-08-12@06:41:26] snmpsolaris10 sysdesc = ::: SunOS snmpsolaris10 5.10 Generic_118833-36 sun4u ::: [07-08-12@06:41:26] snmpsolaris10 did not match apc : 9609 : MN: AP9606 [07-08-12@06:41:26] snmpsolaris10 did not match apc : 9205 : Mod: AP9205 [07-08-12@06:41:26] snmpsolaris10 did not match apc : 9619 : MN:AP9619 [07-08-12@06:41:26] snmpsolaris10 did not match f5 : bigip : bigip [07-08-12@06:41:26] snmpsolaris10 did not match compaq : server : linux|Linux [07-08-12@06:41:26] Discovered snmpsolaris10 as a Solaris  5.10 bash-3.00$
 * Matched log

A mismatch
bash-3.00$ ../../devmon --readbbhosts -vvv --debug [07-08-11@08:50:05] Saw 8 vendors, 24 models, 24 sysdescs & 72 templates [07-08-11@08:50:05] SNMP querying all hosts in bb-hosts file, please wait... [07-08-11@08:50:06] Querying new hosts /w custom cids using snmp v2 [07-08-11@08:50:06] DEBUG SNMP: Dethawing data for snmpsolaris10.mydomain.com [07-08-11@08:50:06] snmpsolaris10.mydomain.com sysdesc = ::: SunOS snmpsolaris10 5.10 Generic_118833-36 sun4u ::: [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match apc : 9609 : MN: AP9606 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match apc : 9205 : Mod: AP9205 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match apc : 9619 : MN:AP9619 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match f5 : bigip : bigip [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match compaq : server : linux|Linux [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 3725 : C3725 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 2970 : C2970 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 1700 : C1700 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 3550 : C3550 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 2900 : C2900 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 1841 : C1841 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 6509 : c6sup|s72033_rp|s222_rp [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 3750 : C3750 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 2801 : C2801 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 2600 : C2600 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 2960 : C2960 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 7206 : 7200 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 2950 : C2950 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match cisco : 3500 : C3500 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match powerware : xups : ConnectUPS [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match powerware : 9170 : BestLink [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match powerware : bestlink : BestLink [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match aruba : 5000 : Aruba5000 [07-08-11@08:50:06] snmpsolaris10.mydomain.com did not match Network Appliances : NetApp v1.0 : NetApp [07-08-11@08:50:06] No matching templates for device: snmpsolaris10.mydomain.com [07-08-11@08:50:06] Querying new hosts using cid 'public' and snmp v2 [07-08-11@08:50:06] Querying new hosts using cid 'private' and snmp v2 [07-08-11@08:50:06] Querying new hosts /w custom cids using snmp v1 [07-08-11@08:50:06] Querying new hosts using cid 'public' and snmp v1 [07-08-11@08:50:06] Querying new hosts using cid 'private' and snmp v1 bash-3.00$
 * installed templates from sourceforge that contains some cisco,apc,powerware and netap template files.
 * following is an example for missing/wrong template file for a snmp agent on Solaris 10.

bash-3.00$ cat specs vendor : Sun model  : Ultra60 snmpver : 2 sysdesc : SunOS snmpsolaris10 5.10 Generic_118833-36 sun4u bash-3.00$
 * Incorrect specs file under templates directory for snmpsolaris10

How to override a mismatch or no match
One can override the detected specs via the model tag in bb-hosts. For example, one could use  for a Dell PowerEdge server running Microsoft Windows (which only reports " " for Microsoft Windows Server 2003).

Linksys BEFSX41
bash-3.00# /opt/bin/snmpwalk -v1 -c public  192gw system SNMPv2-MIB::sysDescr.0 = STRING: BEFSX41 SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.3955.1.1 SNMPv2-MIB::sysUpTime.0 = Timeticks: (638239) 1:46:22.39 SNMPv2-MIB::sysContact.0 = STRING: Linksys SNMPv2-MIB::sysName.0 = STRING: none SNMPv2-MIB::sysLocation.0 = STRING: SNMPv2-MIB::sysServices.0 = INTEGER: 4 bash-3.00#

ONStor 2260
Following is a system description of onstor 2260 device. bash-3.00$ /opt/bin/snmpwalk -v 2c -c mysecret  onstor_ip  system SNMPv2-MIB::sysDescr.0 = STRING: ONStor 2260 NAS Gateway SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.10110.1.1.2.1 SNMPv2-MIB::sysUpTime.0 = Timeticks: (6976921) 19:22:49.21 SNMPv2-MIB::sysContact.0 = STRING: support@onstor.com SNMPv2-MIB::sysName.0 = STRING: onstor_name SNMPv2-MIB::sysLocation.0 = STRING: OnStor SNMPv2-MIB::sysORLastChange.0 = Timeticks: (28) 0:00:00.28 bash-3.00$

Cisco 2950 with snmp agent turned on
myswitch.net switch
 * 2950 Manuals about snmp
 * Enable snmp on cisco 2950 by telnet/ssh into cisco 2950.
 * enable
 * configure terminal
 * snmp-server community public ro
 * end
 * show running-config
 * copy running-config startup-config

User Access Verification

Password: Switch>enable Password: Switch#configure terminal Enter configuration commands, one per line. End with CNTL/Z. Switch(config)# [root@rh9 root]# snmpget -v2c -c public 192.168.1.32 1.3.6.1.2.1.1.1.0 SNMPv2-MIB::sysDescr.0 = STRING: Cisco Internetwork Operating System Software IOS (tm) C2950 Software (C2950-I6K2L2Q4-M), Version 12.1(22)EA6, RELEASE SOFTWARE (fc1) Copyright (c) 1986-2005 by cisco Systems, Inc. Compiled Fri 21-Oct-05 02:22 by yenanh [root@rh9 root]#
 * get sysDescr by snmpget command

Solaris 9
[root@rh9 root]# snmpwalk -v 1 -c public 192.168.1.149 system SNMPv2-MIB::sysDescr.0 = STRING: Sun SNMP Agent, Sun-Blade-100 SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.42.2.1.1 SNMPv2-MIB::sysUpTime.0 = Timeticks: (624162552) 72 days, 5:47:05.52 SNMPv2-MIB::sysContact.0 = STRING: System administrator SNMPv2-MIB::sysName.0 = STRING: oss SNMPv2-MIB::sysLocation.0 = STRING: System administrators office SNMPv2-MIB::sysServices.0 = INTEGER: 72 [root@rh9 root]#

Solaris 10
bash-3.00# pkginfo |egrep 'SUNWsacom|SUNWsasnm|SUNWsadmi|SUNWmibii' system     SUNWmibii                    Solstice Enterprise Agents 1.0.3 SNMP daemon system     SUNWsacom                    Solstice Enterprise Agents 1.0.3 files for root file system system     SUNWsadmi                    Solstice Enterprise Agents 1.0.3 Desktop Management Interface system     SUNWsasnm                    Solstice Enterprise Agents 1.0.3 Simple Network Management Protocol bash-3.00# bash-3.00# ps -eaf |egrep -i 'snmp|dmi' root  224     1   0   Aug 02 ? 0:00 /usr/lib/dmi/dmispd root   77     1   0   Aug 02 ? 0:57 /usr/sfw/sbin/snmpd root   95     1   0   Aug 02 ? 0:00 /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf root  232     1   0   Aug 02 ? 0:00 /usr/lib/dmi/snmpXdmid -s snmpsolaris10 bash-3.00#
 * Solaris 10
 * check if snmp package got installed.
 * configure the snmp conf file.
 * start/stop snmp agent.
 * Running process

Graphing
Some documentation for graphing is provided in the  file. There are two common ways to get graphs from Devmon data. If you are collecting data from an SNMP branch, you would use the instructions for "Graphing a Table". If you are collecting data from an SNMP leaf, you would use the instructions for "Graphing one or several Values".

Graphing a Table
In the message file, one can add RRD definitions to a table definition. This is preferred if the oids are branches. The example given in the documentation is TABLE:rrd(DS:ds0:ifInOctets:COUNTER; DS:ds1:ifOutOctets:COUNTER) This rrd tag does not belong on it's own line, rather the line defining the table to follow. This will add a comment to the generated HTML page with information that can be parsed by the devmon-rrd.pl script located in the extras directory under your devmon install. That will begin to generate the .rrd files for the table.

To parse the .rrd files to make a graph, you must create a definition for the graph in the hobbitgraph.cfg file in your Xymon install directory.

Then add the  setting to the TEST2RRD variable in hobbitserver.cfg.

Note that old versions of Devmon could only use exactly two datasets with specific names of  and.

Note that, if you add a table which will have multiple rows, you should add a  setting to  ; see the FAQ (Why does my custom graph only show in the trends column but not its own column ?) for more information. If you don't do this, the graphs will only appear in the trends column and not in the expected column.

Graphing one or several Values
One can format the message file as something hobbitd_rrd's "ncv" module will recognize. This is preferred if the oids are leaves. NCV stands for "Name Colon Value" and the "ncv" module will parse the generated HTML page for this data. In the message file within your devmon template, when you add a line to display the temperature of a UPS, you would add: Temperature: {tempVar} This would show the value on the web page, but also give the "ncv" module something to parse. It would set a variable of "Temperature" equal to whatever value {tempVar} happens to be, and add it to the .rrd with a DS name of "Temperature". After the .rrd files are being generated, you can add a definition to the hobbitgraph.cfg file that will read the .rrd and display the graph on the page.

To begin generating the .rrd files, one could specify an NCV setting in  like the following for an APC UPS: NCV_env="Temperature:GAUGE" If one anticipates adding or removing values then it's better to use SPLITNCV instead of NCV, again for an APC UPS: SPLITNCV_power="RuntimeRemaining:GAUGE,BatteryCapacity:GAUGE,UPSLoad:GAUGE,Voltageactual:GAUGE,Voltagein:GAUGE,Voltageout:GAUGE,Timeonbattery:GAUGE" In both cases, add a  setting to the TEST2RRD variable in hobbitserver.cfg. Use only  within the TEST2RRD variable, not. See the hobbitd_rrd documentation for more information.

Confirming data is available for a graph
The hobbitd_rrd daemons only read these values at initialization, so if Xymon is running, you'll need to kill the daemons or restart Xymon. Once either of these are in use, wait a few minutes for new data, then check the rrd directory for the usual data files.

Output a graph from the data
If one or more rrd files are available, add the appropriate code to hobbitgraph.cfg. Then add  setting to the GRAPHS variable, in hobbitserver.cfg.

If there are multiple graphs, add  or something similar to each host's TRENDS setting in bb-hosts. Note that you'll only get one graph in the normal column if it isn't a "multigraph" type, but many can appear in the trends column.