In addition to the monitoring performed by our data center provider, Rackspace, each key service and server is monitored closely by our own engineering team. Our monitoring systems are designed to watch for service degradation and to track ongoing and historical performance data to facilitate growth planning.
The primary monitoring system was built around the well-known open source monitoring tool, Nagios
. We have constructed a number of custom plug-ins for Nagios that allow us to monitor every important system function. For instance, we are alerted as soon as an SMTP server's mail queue has accumulated enough waiting email to exceed our normal 1-2 second delivery times, but well before the queue is long enough to be noticed by our customers. This sort of early "heads-up" alert permits our engineers to make minor system adjustments that ensure smooth system operation and improve the overall customer experience.
Adding server capacity ahead of actual growth is important for maintaining high levels of system performance. We are able to identify trends and make our growth projections through the use of Ganglia
, a tool for tracking and charting all sorts of system metrics. Hourly, daily, weekly, monthly, and yearly data views are all available for everything from spam volume and login counts, to storage and CPU utilization. These data points help us to project our server needs and scale the system ahead of demand.