Contact Information ------------------- Owner: sysadmin-main, sysadmin-noc Contact: #fedora-admin, #fedora-noc Location: Anywhere Servers: noc01, noc02, noc01.stg, lockbox01 Purpose: This SOP is to describe nagios configurations Configuration ------------- Instances: Fedora Project runs two nagios instances, nagios (noc01) https://admin.fedoraproject.org/nagios and nagios-external (noc02) http://admin.fedoraproject.org/nagios-external, you must be in the 'sysadmin' group to access them. Staging Istances: Apart from the two production istances, we are currently running a staging istance for testing-purposes available through SSH at noc01.stg. nagios (noc01): The nagios configuration on noc01 should only monitor general host statistics puppet status, uptime, apache status (up/down), SSH etc. The configurations are found in nagios puppet module: puppet/modules/nagios nagios-external (noc02): The nagios configuration on noc02 is located outside of our main datacenter and should monitor our user websites/applications (fedoraproject.org, FAS, PackageDB, Bodhi/Updates). The configurations are found in nagios puppet module: puppet/modules/nagios Production and staging istances through SSH: Note: Please make sure you are into 'sysadmin' and 'sysadmin-noc' FAS groups before trying to access these hosts. See SSH Access SOP NRPE: We are currently using NRPE to execute remote Nagios plugins on any host of our network. A great guide about it and its usage mixed up with some nice images about its structure can be found at: http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf Understanding the Messages -------------------------- General: Nagios notifications are generally easy to read, and follow this consistent format: ** PROBLEM/ACKNOWLEDGEMENT/RECOVERY alert - hostname/Check is WARNING/CRITICAL/OK ** ** HOST DOWN/UP alert - hostname ** Reading the message will provide extra information on what is wrong. Disk Space Warning/Critical: Disk space warnings normally include the following information: DISK WARNING/CRITICAL/OK - free space: mountpoint freespace(MB) (freespace(%) inode=freeinodes(%)): A message stating "(1% inode=99%)" means that the diskspace is critical not the inode usage and is a sign that more diskspace is required. Further Reading --------------- * Puppet SOP * Outages SOP