Version 5 (modified by 13 years ago) ( diff ) | ,
---|
Health an Monitoring of Server
Date | 2012/03/26 |
Contact(s) | Jesse Eichar |
Last edited | |
Status | draft |
Assigned to release | 2.8 |
Resources | R&D Camptocamp |
Overview
Provide a system for monitoring the health of a Geonetwork instance as well as metrics for some important functions.
Proposal Type
- Type: Now Module
- App: GeoNetwork
- Module:
Links
- Email discussions:
- IRC discussions:
Voting History
- None as yet
Motivations
At the moment one must make several calls to a Geonetwork instance to ensure that the important functions are running and even that could not detect spurious or difficult to detect instabilities of Geonetwork. It would be useful to have a consistent way to both register and view such important characteristics like database connection, errors encountered, corrupt index. Failed logins, etc...
Proposal
The Metrics library (http://metrics.codahale.com/) by Yammer has excellent support for monitoring the performance and health of a system. It provides a consistent API for developers to register some vital statistics of an application. For example in Geonetwork we might want to have a monitor system (like nagios or collectd) check the health of the system which would include checking the database connection, ability to open files, check the index, etc... In addition we might want to attach a Metrics appended to the logging to track the number of errors being logged and the monitor system would be able to warn of a potentially unstable system based on the number of errors being logged.
Metrics has 2 Apis, one for configuring the health checks and another for performing the configured health checks. The 'out' Apis include JMX, JSON as well as HTML pages that could be integrated into the admin user interface.
It is important to realize that metrics is not exactly the same as statistics in my use case. While it could be used in some capacity for statistics, in this proposal metrics will be used as a standard API and utilities for creating a monitoring subsystem that is flexible, extensible and can interoperate with many existing monitoring systems.
Some monitors I propose to make are:
- Database Gauge - checks that the database is accessible
- Index Gauge - checks that the Lucene index is searchable
- Error Meter - monitors the frequency that errors are logged
- Request Meter - monitor the number of requests that made. This is to potentially detect DOS attacks
- Pending Request Counter - Track the current number of requests being processed.
Backwards Compatibility Issues
A new dependency and update to web.xml optionally
Risks
Nothing notable
Participants
- As above