System Monitoring with Xymon/Administration Guide

All things related system administration will be documented here.

Xymon Protocol

 * There is a version of Xymon protocol in ASCII text format from Xymon author.

Architecture of a Xymon System Monitoring Environment
TBC

Picking an OS for Xymon Server
These are some notes and advice from Xymon users.

Linux

 * Red Hat Enterprise Linux / CentOS
 * Debian

Pros

 * Plus 1: Turbocharged TCP/IP.
 * Plus 2: dtrace
 * Plus 3: Self Heal
 * Plus 4: You can configure root and disk to use zfs and have zfs snapshot enabled.

Cons

 * Minus 1: Xymon depended on other open source software that doesn't come with Oracle Solaris by default. Following are three sources where you can get the software in binary or source code format.
 * 1) http://www.blastwave.org
 * 2) http://www.sunfreeware.com has lots of open source.
 * 3) http://www.thewrittenword.com

List of software required to meet all dependecies and order of installation:
 * 1) common-1.4.5-SunOS5.8-sparc-CSW.pkg.gz
 * 2) pcre-4.5-SunOS5.8-sparc-CSW.pkg.gz
 * 3) fping-2.4,REV=2004.10.12_rev=b2_to_ipv6-SunOS5.8-sparc-CSW.pkg.gz
 * 4) zlib-1.2.3,REV=2007.05.12-SunOS5.8-sparc-CSW.pkg.gz
 * 5) png-1.2.18-SunOS5.8-sparc-CSW.pkg.gz
 * 6) libiconv-1.9.2-SunOS5.8-sparc-CSW.pkg.gz
 * 7) expat-1.95.7-SunOS5.8-sparc-CSW.pkg.gz
 * 8) ggettext-0.14.1,REV=2005.06.29-SunOS5.8-sparc-CSW.pkg.gz
 * 9) libpopt-1.7,REV=2004.05.15-SunOS5.8-sparc-CSW.pkg.gz
 * 10) chkconfig-1.2.24h,REV=2006.12.12-SunOS5.8-sparc-CSW.pkg.gz
 * 11) libpopt-1.7,REV=2004.05.15-SunOS5.8-sparc-CSW.pkg.gz
 * 12) openssl-0.9.8,REV=2007.05.10_rev=e-SunOS5.8-sparc-CSW.pkg.gz
 * 13) imaprt-2004,REV=2006.09.02_rev=g-SunOS5.8-sparc-CSW.pkg.gz
 * 14) freetype2-2.1.10,REV=2005.12.11-SunOS5.8-sparc-CSW.pkg.gz
 * 15) libart-2.3.16-SunOS5.8-sparc-CSW.pkg.gz
 * 16) berkeleydb44-4.4.20,REV=2007.01.27-SunOS5.8-sparc-CSW.pkg.gz
 * 17) ncurses-5.5,REV=2006.02.10-SunOS5.8-sparc-CSW.pkg.gz
 * 18) readline-5.0,REV=2005.06.07-SunOS5.8-sparc-CSW.pkg.gz
 * 19) gbc-1.06-SunOS5.8-sparc-CSW.pkg.gz
 * 20) gdbm-1.8.3,REV=2006.01.01-SunOS5.8-sparc-CSW.pkg.gz
 * 21) perl-5.8.8,REV=2007.03.16-SunOS5.8-sparc-CSW.pkg.gz
 * 22) cvs-1.11.22-sol10-sparc-local.gz
 * 23) rrdtool-1.2.19,REV=2007.02.07-SunOS5.8-sparc-CSW.pkg.gz
 * 24) libnet-1.0.2,REV=2004.04.08_rev=a-SunOS5.8-sparc-CSW.pkg.gz
 * 25) berkeleydb4-4.2.52,REV=2005.04.28_rev=p4-SunOS5.8-sparc-CSW.pkg.gz
 * 26) sasl-2.1.22,REV=2007.06.19-SunOS5.8-sparc-CSW.pkg.gz
 * 27) openldap_rt-2.3.35,REV=2007.04.14-SunOS5.8-sparc-CSW.pkg.gz
 * 28) xymon-4.2.0,REV=2007.04.12-SunOS5.8-sparc-CSW.pkg.gz
 * 29) xymon_client-4.2.0,REV=2007.04.12-SunOS5.8-sparc-CSW.pkg.gz

Xymon Server: Solaris Intel 11/06 U3 VMware appliance on a 2GB flash pen drive
Following are main procedures for this to-go Xymon server.


 * VMware server 1.0.1 to create Solaris 10 VMware session.
 * Create a 1.9G partition, select custom install.
 * modify the partition table to take out /export/home,only leave /swap and /.
 * decrease default 512M swap size to 300M.
 * select "Core group" (about 573M in size).
 * Install httpd server
 * Install xymon server

Xymon Server and Development: Solaris Intel 11/06 U3 VMware appliance on a 4GB flash pen drive

 * VMware server 1.0.1 to create Solaris 10 VMware session.
 * Need to use vmware player 1.0.3 so dhcp will work.

Xymon Server Test site

 * Solaris Intel 11/06 U3 VMware appliance on a 4GB flash pen drive

Servers
This is a comparison table on how Xymon server is different from BB when performing an administration task.

Clients
This is a comparison on how Xymon is different from BB when performing an administration task.

Capacity Planning
rule of Thumb is 5MB disk space on Xymon server per machine being monitored

Client

 * Run the BBWin 0.13 installer.
 * Under HKEY_LOCAL_MACHINE\SOFTWARE\BBWin (32-bit) or HKLM\SOFTWARE\Wow6432Node\BBWin (64-bit) in the registry set the computer name (as it is in the bbhosts file)


 * Make the top of the config file in C:\Program Files\BBWin\etc (or C:\Program Files (x86)\BBWin\etc on Windows x64 systems) look like this:

  
 * Delete or comment out the default lines:

...snip ...

 HOST=new host name, as it appears in the bbhosts file LOAD 65 75      # Load thresholds are in % DISK C 80 90 DISK D 90 95 MEMPHYS 75 101 MEMSWAP 75 85 MEMACT 75 85 PROC BBWin.exe 1 1
 * This causes these thresholds to be set at the server side. Any settings here will override the settings in the server's analysis.cfg file. It is much easier to manage these settings centrally.
 * Start the service at the server.
 * Then edit /home/xymon/server/etc/analysis.cfg and add:
 * 1) Hostname entries from bbwin clients.

Server
[win32] eventlog:Security ignore Success eventlog:System ignore Information eventlog:Application ignore Information
 * /xymon/server/etc/client-local.cfg:


 * filtering in: /xymon/server/etc/analysis.cfg

CLASS=win32 LOAD 80 90 # Load thresholds are in % PROC BBWin.exe 1 1 PORT STATE=LISTENING MIN=0 TRACK=Listen TEXT=Listen LOG %.* %error -.* COLOR=yellow LOG eventlog:Security %failure.* COLOR=yellow LOG eventlog:Application %warning.* COLOR=yellow LOG eventlog:System %error.* COLOR=yellow

CLASS=win32 LOAD 80 90 # Load thresholds are in % PROC BBWin.exe 1 1 PORT STATE=LISTENING MIN=0 TRACK=Listen TEXT=Listen LOG %.* %^error.* COLOR=red #IGNORE=TermServDevices \(        LOG %.*  %^warning.* COLOR=yellow IGNORE=%.*TermServDevices.*        LOG %.*  %^failure.* COLOR=yellow
 * Instead you can use the following, but every update to the eventlog is send to the xymon server (instead of local filteret first).

Unix-like

 * AIX
 * Debian (Ubuntu)
 * FreeBSD
 * HP-UX
 * IRIX
 * Mandriva (xymon 4.2.3 is available in contrib as of 2009.0, prior to that hobbit was available in contrib)
 * NSLU2 Unslung OS
 * RedHat Linux / RedHat Enterprise Linux / Fedora Core (http://rpm.razorsedge.org/ or http://staff.telkomsa.net/packages/)
 * Solaris As of 20 September 2012, Blastwave has ceased its operation, and its Web site is no longer accessible.

Building from package source using TWW HPMS
TWW Hyper Package Management system can help a software developer or system administrator to create different native package formats for different OS. The package source for compiling and packaging hobbit client and server software are in XML format that can be repeated reliably with TWW's sb and pb tools.

Hobbit server and Hobbit client package source is GPL licensed on TWW's support ftp server.

Building from src RPM
Sometimes it's better to build your own RPMs specifically for your environment. If you are using RH Enterprise or CentOS, the Fedora Core or generic RPM may not install correctly. You could also run into this problem if you have versions of dependent libraries that are not compatible with the system that the RPM was built on.

In order to build the src RPM, you'll need several packages:
 * 1) openssl-devel, openldap-devel, and pcre-devel from the CentOS CDs.
 * 2) * You may also have to make a link from /usr/include/pcre/pcre.h to /usr/include/pcre.h
 * 3) rrdtool-devel
 * 4) * I recommend getting this from the DAG repository
 * 5) fping
 * 6) * Also available from the DAG repository

RPMs from a matching version of RHEL usually work on CentOS with no problem (for example RPMs for EL 4 work fine on CentOS 4)

Once you have all the dependencies installed, download the src RPM from SourceForge. Once you have that, just run rpmbuild --rebuild hobbit-xxxx.src.rpm. For example:

The rpmbuild command should compile and build the RPM for you. You can watch the compiler output for any problems. After it is done, you should have new RPMs in the /usr/src/redhat/RPMS/i386 directory (assuming your architecture is i386). This process will build both server and client RPMs for your system. The server RPM also includes the client, so it is not necessary to install both of them.

SUSE
Dependencies for installation include apache2, apache2-utils, gcc, libstdc++-devel, net-snmp, pcre, pcre-devel, rrdtool and rrdtool-devel. Download the latest Xymon source from http://sourceforge.net/projects/xymon/files/Xymon/. Ensure that mod_rewrite is enabled in apache2, from YAST -> Network Services -> HTTP Server -> Server Modules.

$ useradd -m xymon $ ./configure.server [...] [...] What group-ID does your webserver use [nobody] ? www [...] $ make [...] Now run 'make install' as root $ make install [...] Installation complete.
 * Where do you want the Xymon installation [/home/xymon] ?
 * cp /home/xymon/server/etc/xymon-apache.conf to /etc/apache2/conf.d/
 * htpasswd2 -c /home/xymon/server/etc/xymonpasswd 
 * Ensure that fping can be executed by user xymon, either via appropriate sudo permissions, or by chmodding fping to setuid root.
 * Start the apache2 service.
 * /home/xymon/server/bin/xymon.sh start

Ubuntu
With Synaptic, install the PCRE and RRDtool libraries. Then, download xymon and unpack it.

Launch a terminal (CTRL + t) and enter the commands below, in order to install the software in your HTTP directory. Example with Apache: If it hasn't already been done, it's necessary to configure Apache to execute the CGI programs: Finally, test the software: http://localhost/xymon/server/bin/confreport.cgi

Hobbit in HA
There are two approaches to implement High Availability for Xymon servers,HA-LAN and HA-WAN. Pick one of them according to your network structure.

HA-LAN approach
This approach is using clustering software to do fail over using a set of Xymon servers. Each OS has their own version of clustering software. We know for Linux we can use Linux-HA plus DRBD. For Solaris, we have Sun Cluster Software.

The cons of this approach is the High Availability is at the scale of LAN not WAN level. The server in clustering need to reside at same LAN subnet. If the clustering site went down then we will end up with xymon messages has no place to send message to.

HA-WAN approach
For networks that span over states or countries, failing over a primary xymon server to standby server over WAN network is not an easy networking task.

Following HA-WAN architecture can do fail-over without involve network team to do dns or routing changes.

hobbit.test.com                    hobbit2.test.com | Primary                        | Standby Xymon server | <-  heart beat ->      | LAN1        |                                 |     LAN2 --            -     ^           ^           ^                ^   ^          ^     |           |           |                |   |          |     |  ---   |          |     |  |        |               |     |  |        |     |     |--     |     |  |        |     |                                |    | hobbitc A     hobbitc B                              hobbitc C     LAN 3         LAN 4                                LAN 5

LAN1: California LAN2: Brazil LAN3: Argentina LAN4: Mexico LAN5: Japan

Requirements

 * a script that can detect failing of hobbit.test.com services.

Pros

 * No need to alter existing network configuration.

Cons

 * Increase network bandwidth by sending same message to two different servers.

HA-WAN 2 approach
From Patrick: we have 3 data centres and each data centre contains a xymon server. All clients in a data centre only report to their local xymon server. However the xymon servers can communicate with each other using BBDISPLAYS (its a little more complicated than that as we utilise a bbproxy in each DC to take the messages and spray them to all 3 xymons).

hobbit1.test.com                    hobbit2.test.com | Primary                        | Standby Xymon server | <-  bbproxy    ->      | LAN1        |                                 |     LAN2 --            -     ^          ^     ^                                ^     |          |     |                                |     |          |     |                                |     |          |     |                                |     |          |     |                                |     hobbitc A     hobbitc B                              hobbitc C

LAN1= has hobbitc A,B LAN2= has hobbitc C

HA-WAN3 approach
This is a two node hobbit loosely-coupled cluster across WAN. It has following challenge need to be resolved.


 * hobbit.test.com DNS need to failover to hobbit2 from hobbit1 when hobbit1 is down.
 * The web page on hobbit1 and hobbit2 are not in sync.
 * Maintence records are not in sync between two servers.
 * RRD databases on two hobbit servers are not in sync after either one server is down for a while.

hobbit.test.com -> hobbitdynamic.test.com (using CISCO DD software). -> hobbit1.test.com -> hobbit2.test.com

hobbit1.test.com                    hobbit2.test.com | Primary                            | Standby Xymon server | <- 1985 heart beat ->      | | <- 1986 history    ->      | | <- 1987 heart beat ->      | LAN1        |                                     |     LAN2 --            -     ^           ^           ^                ^   ^          ^     |           |           |                |   |          |     |  ---   |          |     |  |        |               |     |  |        |     |     |--     |     |  |        |     |                                |    | hobbitc A     hobbitc B                              hobbitc C     LAN 3         LAN 4                                LAN 5

LAN1: California LAN2: Brazil LAN3: Argentina LAN4: Mexico LAN5: Japan

Requirements

 * a script that can detect failing of hobbit.test.com services.

Pros

 * No need to alter existing network configuration.

Cons

 * Increase network bandwidth by sending same message to two different servers.

Hobbit HA on LAN
hobbit.test.com                      hobbit2.test.com |      HA Software                 | |   <-  heart beat ->              | |                                  | LAN1: 192.168.1.0 ^         ^    ^     |          |    |     |          |    ---     |          |                              |     |          |                              |     |          |      hobbitc A     hobbitc B                   hobbitc C  LAN 2          LAN 3                        LAN4

LAN1: California LAN2: Brazil LAN3: Argentina LAN4: Mexico

Pros

 * Close to real-time fail-over.

Cons

 * Fail over happens only on LAN, not WAN.

SunCluster
Free and opensourced clustering software from Sun. Commercial technical support is available.
 * Using two sol-nv-b68-x86 VMware sessions with Sun Cluster express 07/07.

FST HA
An opensource Clustering solution specifically for Solaris. Small Text
 * FSTha vs other HA

Hobbit(bb)/XyMon port 1984 encryption Using Stunnel

 * References: http://www.stunnel.org/

Plain text bb message will be a bottleneck to make Hobbit a enterprise solution which require high security standard. Following is an attempt to make your CIO smile on hobbit solution. Note: It is possible to use reverse SSH tunnels, using Padraig Lennon's ssh_tunnels.sh script. instead of Stunnel server and client. See more details in Monitor Hobbit clients in a DMZ using reverse SSH tunnels


 * 1) Machine A : has both HB Server and Stunnel server running.
 * 2) Machine B : is a BB client.
 * 3) Machine C : is a hobbit client with stunnel client enabled. hb client will send bb message via encrypted port 1999.
 * 4) Machine D : is a HB client.
 * 5) Note: old bb port is one way, hb's bb protocol's is bi-directional.

Machine A (192.168.1.111)

---    HB Server process         |   <-port 1984 <-  BB client (Machine B)         |                     | |1984                |   <-port 1984 ->  HB client (Machine D)         |                     | Stunnel Server process 1999 |  < port 1999 --> 1999 Stunnel Client |           (Machine C 192.168.1.141) |                                                                  --1984 ---HB client

Configure stunnel server to run in hobbit server

 * 1) stunnel config file on server to direct 1999 into local 1984 port.


 * 1) starting stunnel server on machine A. we can see hobbit-server port redirection is ok.
 * 1) make sure stunnel is running.
 * 1) Testing port 1999 on hb server directly, typing garbage message "asdf" and then control+d to quit.
 * 1) We can see port 1999 has incoming message from 192.168.1.141(machine c)in stunnel log file on machine A.

Configuring hb client to use port 1999

 * 1) add hobbitclientLocalIP into hobbitclient.cfg file. We want hobbit client send bb message to itself.
 * 1) A successful hobbit client stunneling to hobbit server using port 1999.

Using HTTPS Transport
A posting at http://lists.xymon.com/archive/2011-October/032866.html describes a technique where Xymon clients can submit client messages using a web connection. It requires a CGI script to be installed on the Xymon server. This method can be used to connect via web proxies, and authentication can be achieved by configuring the web server to enforce client-side certificates or user/password logins.

Encryption via Secure-Shell (ssh) Tunnel
Xymon can be configured to use the IP address of an ssh tunnel, and thus its traffic will be encrypted. This section describes two ways to establish a tunnel between the Xymon server and Xymon client.

Persistent Tunnel
This method is essentially creating a kind of VPN between the Xymon server and the client. Once established, the Xymon client is configured with XYMSRV set to 127.0.0.1, and all updates are sent down the tunnel.

The simplest way to setup a persistent tunnel is with a tool such as Autossh. There's also a Xymon-specific add-on for establishing tunnels called ssh_tunnel.

Ephemeral ssh Tunnel
An ephemeral tunnel is a temporary tunnel created only when Xymon data need to be collected. Secure shell tunnels make use of key authentication so that passwords are not required. They can be established by ssh connection made in either direction, depending on requirements. In both cases, XYMSRV is set to 127.0.0.1.

Xymon Server to Client
For a server-to-client connection, the Xymon server runs an ssh connection to the client with a remote tunnel on port 1984, sets up some variables, and runs the Xymon client scripts. An example is shown here.

ssh -R1984:127.0.0.1:1984 -o batchmode=yes xymon@xymon-client '/usr/lib/xymon/client/bin/xymoncmd sh -c "XYMSRV=127.0.0.1 /usr/lib/xymon/client/bin/xymonclient.sh"'

This command can be put into tasks.cfg, run every 5 minutes.

Xymon Client to Server
For a client-to-server connection, the Xymon client establishes a connection to the server with a local tunnel on port 1984, and runs the Xymon client scripts. An example is shown here.

ssh -f -L1984:127.0.0.1:1984 xymon@xymon-server sleep 15 && /usr/lib/xymon/client/bin/xymoncmd sh -c "XYMSRV=127.0.0.1 /usr/lib/xymon/client/bin/xymonclient.sh"

This command should be run every 5 minutes on the Xymon client, and can be run from cron or from clientlaunch.cfg.

32 bit vs 64 bit binary for hobbit on Solaris

 * This article describe this subject in great detail.

LDAP Authentication
Example httpd.conf (Apache 2.0.x with LDAP authenticated against Active Directory):

Substitute LDAPSERVER.DOMAIN.COM with your LDAP server

: use account with permission to view LDAP directory

: password for account (You should limit what this account can do)

Same for a Novell-edir ldap server:

Alerts setting
Using sms_client [smsclient.org]
 * Pager

Create a shell-script (/usr/bin/hobbitsms) like this:

Edit hobbit-alerts.cfg and add the lines for the alerts you want to receive:

Using snpp sendpage.org
 * Pager.

Create a shell-script (/usr/bin/hobbitsnpp) like this:


 * Email.

How to shorten Xymon Server nslook up time ?
Xymon server do lots nslookup for every five minutes on the machines that need to be pinged.

Install a local dns cache server. I use djbdns for it

Overview
Remedy ticket system has a web interface for opening up a ticket to a particular ticket queue.

The Perl approach is to use the following software to automate the ticket request when an alert occurs.


 * perl
 * LWP
 * trouble_ticket.tgz on http://www.deadcat.net
 * an entrance URL on remedy server web interface.
 * A perl subroutine to open up remedy ticket.

System and Inventory Monitoring
System monitoring and inventory monitoring can achieved by an external module to report a system's inventory's information.(TBC)

Q. When I click on a status icon I get the message "Status not available". What should I check?
A. First make sure that the server is actually running. ps -ef | grep hobbitd

You should see several processes similar to: hobbit  32717 32716  0 Nov07 ? 00:01:07 hobbitd --pidfile.... hobbit  32726 32716  0 Nov07 ? 00:00:03 hobbitd_channel --channel=page... hobbit  32727 32716  0 Nov07 ? 00:01:58 hobbitd_channel --channel=status... hobbit  32728 32716  0 Nov07 ? 00:00:01 hobbitd_channel --channel=data... hobbit  32725 32716  0 Nov07 ? 00:00:00 hobbitd_channel --channel=stachg...

If the server is failing to start, start looking at the hobbit logs directory. Check here for one location /var/log/hobbit

Q. After installing the Hobbit client, my msgs tests are "clear" (sometimes referred to as "white")
A. As of the time of this writing, the Hobbit client does NOT have msgs functionality like the BB client does. This can be added by installing the bb-msgs.sh file from the BB client as an external test. Even so, the Hobbit server will turn the test to "clear" instead of the expected status. To correct his issue, you'll have to edit the hobbitlaunch.cfg file (usually found in /etc/hobbit/ or /usr/lib/hobbit/server/etc/) to add --no-clear-msgs to the client channel and restart the server: CMD hobbitd_channel --channel=client hobbitd_client --no-clear-msgs --log=$BBSERVERLOGS/clientdata.log ...