Linux Guide/Monitoring

=Introduction = This page is an TODO state. anyone is free to complete/contribute to it. For now (2010-06-11) it contains random notes I've been collecting through the time.

TODO is a mark meaning "to do" ("TODO" is automatically recognized by some editing tools as a pending tasks).

= HARDWARE MONITORING =

Rescanning the SCSI Bus
Next link provides a quick script to rescan the SCSI bus in Linux.

There is a simpler way that most of the time will work properly:

echo "- - -" > /sys/class/scsi_host/host0/scan

An slightly more complex script example for a Qlogic card: for HBA in `ls -A /proc/scsi/qla2xxx/` do   echo "scsi-qlascan" > /proc/scsi/qla2xxx/${HBA} done
 * 1) !/bin/bash

Alternatively iscsiadm can be used if available: iscsiadm -t discovery --type sendtargets --portal  iscsiadm -t node --targename -- portal --login

Amognst other documents available on the net Red Hat Enterprise Linux 5 Online Storage Reconfiguration Guide can also be a useful help.

DMIDECODE
Dmidecode reports information about your system's hardware as described in your system BIOS according to the SMBIOS/DMI standard (see a sample output). This information typically includes system manufacturer, model name, serial number, BIOS version, asset tag as well as a lot of other details of varying level of interest and reliability depending on the manufacturer. This will often include usage status for the CPU sockets, expansion slots (e.g. AGP, PCI, ISA) and memory module slots, and the list of I/O ports (e.g. serial, parallel, USB).

TODO IPMI
What is IPMI? The Intelligent Platform Management Interface (IPMI) specification defines a set of interfaces for platform management. It is   implemented by a large number of hardware manufacturers to support system management on motherboards. The features of IPMI that most users will be interested in are sensor monitoring (i.e. CPU   temperatures, fan speeds), remote power control, and serial-over-LAN (SOL). What is FreeIPMI? FreeIPMI provides in-band and out-of-band IPMI software based on the IPMI v1.5/2.0 specification. FreeIPMI provides tools and libraries for users to access and read IPMI sensor readings, system event log (SEL) entries, serial-over-LAN (SOL), remote power control functions, field replaceable unit (FRU) device information, and more. More information about FreeIPMI can be found at the FreeIPMI webpage at: http://www.gnu.org/software/freeipmi/index.html

TODO smartctl:
************************************************************************   ~# smartctl -d cciss,0 -a /dev/cciss/c0d0 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: HP      DH072ABAA6       Version: HPD7 Serial number: 3PD19ZMN0000983153B8 Device type: disk Transport protocol: SAS Local Time is: Sat Jul 19 20:09:09 2008 CEST Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK   Current Drive Temperature:     29 C    Drive Trip Temperature:        68 C    Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 899299930 Blocks received from initiator = 14843797 Blocks read from cache and sent to initiator = 3793967485 Number of read and write commands whose size <= segment size = 48565840 Number of read and write commands whose size > segment size = 0 Vendor (Seagate/Hitachi) factory information number of hours powered up = 945.00 number of minutes until next internal SMART test = 7 Error counter log: Errors Corrected by          Total   Correction     Gigabytes    Total ECC         rereads/    errors   algorithm      processed    uncorrected fast | delayed  rewrites  corrected  invocations   [10^9 bytes]  errors read:         0        0         0         0          0          0.000           0 write:        0        0         0         0          0          0.000           0 Non-medium error count:       0 No self-tests have been logged Long (extended) Self Test duration: 840 seconds [14.0 minutes] ************************************************************************   ~# smartctl -d cciss,1 -a /dev/cciss/c0d0 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: HP      DH072ABAA6       Version: HPD7 Serial number: 3PD19ZPV000098315CX2 Device type: disk Transport protocol: SAS Local Time is: Sat Jul 19 20:09:12 2008 CEST Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK   Current Drive Temperature:     30 C    Drive Trip Temperature:        68 C    Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 920490987 Blocks received from initiator = 14368268 Blocks read from cache and sent to initiator = 3755437180 Number of read and write commands whose size <= segment size = 48820139 Number of read and write commands whose size > segment size = 0 Vendor (Seagate/Hitachi) factory information number of hours powered up = 945.02 number of minutes until next internal SMART test = 8 Error counter log: Errors Corrected by          Total   Correction     Gigabytes    Total ECC         rereads/    errors   algorithm      processed    uncorrected fast | delayed  rewrites  corrected  invocations   [10^9 bytes]  errors read:         0        0         0         0          0          0.000           0 write:        0        0         0         0          0          0.000           0 Non-medium error count:       0 No self-tests have been logged Long (extended) Self Test duration: 840 seconds [14.0 minutes] ************************************************************************   ~# smartctl -d cciss,2 -a /dev/cciss/c0d0 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: HP      DH072ABAA6       Version: HPD7 Serial number: 3PD1A0SD000098300K39 Device type: disk Transport protocol: SAS Local Time is: Sat Jul 19 20:09:15 2008 CEST Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK   Current Drive Temperature:     31 C    Drive Trip Temperature:        68 C    Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 913141941 Blocks received from initiator = 11455509 Blocks read from cache and sent to initiator = 3697098775 Number of read and write commands whose size <= segment size = 49159966 Number of read and write commands whose size > segment size = 0 Vendor (Seagate/Hitachi) factory information number of hours powered up = 944.93 number of minutes until next internal SMART test = 18 Error counter log: Errors Corrected by          Total   Correction     Gigabytes    Total ECC         rereads/    errors   algorithm      processed    uncorrected fast | delayed  rewrites  corrected  invocations   [10^9 bytes]  errors read:         0        0         0         0          0          0.000           0 write:        0        0         0         0          0          0.000           0 Non-medium error count:       0 No self-tests have been logged Long (extended) Self Test duration: 840 seconds [14.0 minutes]

DELL OMSA monitorization
Installing OMSA fro hardware monitorization in Dell Servers:

OMSA allows to monitor the health of RAIDs, motherboard/disk/chasis temperature, alarm generation, set/modify BIOS, watch installed devices,

To install under Debian:

1.- Add to /etc/apt/sources.list the next line:

deb ftp://ftp.sara.nl/pub/sara-omsa dell sara

2.- Execute apt-get update && apt-get install dellomsa

That install OMSA in /opt/dell.

3.- To boot the system:

~# /opt/dell/srvadmin/dataeng/bin/dsm_sa_datamgr32d -run ~# /opt/dell/srvadmin/dataeng/bin/dsm_sa_eventmgr32d -run

OMSA Usage Examples:
To check the health of the disc connected to controller 0:

~# /etc/delloma.d/oma/bin/omreport.sh storage pdisk controller=0

The output will look similar to:

List of Physical Disks on Controller PERC 4e/Di (Embedded) Controller PERC 4e/Di (Embedded) ID                       : 0:0 Status                   : Ok    Name                      : Physical Disk 0:0 State                    : Online Failure Predicted        : No    Progress                  : Not Applicable Type                     : SCSI Capacity                 : 68.24 GB (73274490880 bytes) Used RAID Disk Space     : 68.24 GB (73274490880 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare                : No    Vendor ID                 : MAXTOR Product ID               : ATLAS10K5_73SCA Revision                 : JNZY Serial No.               : J20KVCTK Negotiated Speed         : 320 Capable Speed            : 320 Manufacture Day          : Not Available Manufacture Week         : Not Available Manufacture Year         : Not Available SAS Address              : Not Available ID                       : 0:1 Status                   : Ok    Name                      : Physical Disk 0:1 State                    : Online Failure Predicted        : No    Progress                  : Not Applicable Type                     : SCSI Capacity                 : 68.24 GB (73274490880 bytes) Used RAID Disk Space     : 68.24 GB (73274490880 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare                : No    Vendor ID                 : MAXTOR Product ID               : ATLAS10K5_73SCA Revision                 : JNZY Serial No.               : J20KV5RK Negotiated Speed         : 320 Capable Speed            : 320 Manufacture Day          : Not Available Manufacture Week         : Not Available Manufacture Year         : Not Available SAS Address              : Not Available ID                       : 0:2 Status                   : Ok    Name                      : Physical Disk 0:2 State                    : Online Failure Predicted        : No    Progress                  : Not Applicable Type                     : SCSI Capacity                 : 68.24 GB (73274490880 bytes) Used RAID Disk Space     : 68.24 GB (73274490880 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare                : No    Vendor ID                 : MAXTOR Product ID               : ATLAS10K5_73SCA Revision                 : JNZY Serial No.               : J20KTS8K Negotiated Speed         : 320 Capable Speed            : 320 Manufacture Day          : Not Available Manufacture Week         : Not Available Manufacture Year         : Not Available SAS Address              : Not Available

To check the state/configuration of the RAID:

~# /etc/delloma.d/oma/bin/omreport.sh storage vdisk controller=0

That will look like:

Virtual Disk 0 on Controller PERC 4e/Di (Embedded) Controller PERC 4e/Di (Embedded) ID                 : 0 Status             : Ok    Name                : Virtual Disk 0 State              : Ready Progress           : Not Applicable Layout             : RAID-5 Size               : 136.48 GB (146548981760 bytes) Device Name        : /dev/sda Type               : SCSI Read Policy        : Adaptive Read Ahead Write Policy       : Write Back Cache Policy       : Direct I/O Stripe Element Size : 64 KB

To get an summary of the server:

~# /etc/delloma.d/oma/bin/omreport.sh system summary System Summary --   Software Profile --   Systems Management Name                      : Information not available. Version                   : 3.2.0 Description               : Systems Management Software Operating System Name                      : Linux Version                   : Kernel 2.6.18.2 (i686) System Time               : Sun Nov 25 18:30:37 2007 System Bootup Time        : Fri Oct 12 15:20:31 2007 System System Host Name                 : MySuperServidor System Location           : Please set the value -   Main System Chassis -   Chassis Information Chassis Model             : PowerEdge 2850 Chassis Service Tag       : Chassis Lock              : Present Chassis Asset Tag         : Processor 1 Processor Manufacturer    : Intel Processor Family          : Xeon Processor Version         : Model 4 Stepping 3 Current Speed             : 3200 MHz Maximum Speed             : 3600 MHz External Clock Speed      : 800 MHz Voltage                   : 1400 mV    Processor 2 Processor Manufacturer    : Intel Processor Family          : Xeon Processor Version         : Model 4 Stepping 3 Current Speed             : 3200 MHz Maximum Speed             : 3600 MHz External Clock Speed      : 800 MHz Voltage                   : 1400 mV    Memory Total Installed Capacity  : 2048 MB    Memory Available to the OS : 2023 MB    Total Maximum Capacity     : 16384 MB    Memory Array Count         : 1 Memory Array 1 Location                  : System Board or Motherboard Use                       : System Memory Installed Capacity        : 2048 MB    Maximum Capacity           : 16384 MB    Slots Available            : 6 Slots Used                : 2 ECC Type                  : Multibit ECC Slot PCI1 Adapter                   : [Not Occupied] Type                      : PCI X    Data Bus Width             : 64 Bits Speed                     : 133 MHz Slot Length               : Long Voltage Supply            : 3.3 Volts Slot PCI2 Adapter                   : [Not Occupied] Type                      : PCI X    Data Bus Width             : 64 Bits Speed                     : 133 MHz Slot Length               : Long Voltage Supply            : 3.3 Volts Slot PCI3 Adapter                   : PRO/100 S Server Adapter Type                      : PCI X    Data Bus Width             : 64 Bits Speed                     : 133 MHz Slot Length               : Short Voltage Supply            : 3.3 Volts BIOS Information Manufacturer              : Dell Inc.    Version                    : A04 Release Date              : 09/22/2005 --   Network Data --   IP Address Data IP Address 0              : 192.168.2.2 IP Address 1              : 192.168.0.115 Storage Enclosures Storage Enclosures Name                      : Backplane Service Tag               : 62P00P8

TODO logwatch
= SOFTWARE MONITORING =