Changes between Initial Version and Version 1 of DependencyInjection


Ignore:
Timestamp:
Jul 7, 2012, 12:34:53 PM (12 years ago)
Author:
jesseeichar
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DependencyInjection

    v1 v1  
     1= Dependendy Injection =
     2
     3|| '''Date''' || 2012/03/26 ||
     4|| '''Contact(s)''' || [http://wiki.osgeo.org/wiki/User:Jeichar Jesse Eichar] ||
     5|| '''Last edited''' || ||
     6|| '''Status''' || draft ||
     7|| '''Assigned to release''' || 2.9.x ||
     8|| '''Resources''' || R&D Camptocamp ||
     9|| '''Code''' || https://github.com/jesseeichar/geonetwork/tree/feature/injection ||
     10|| '''Ticket''' || #846 ||
     11
     12== Overview ==
     13
     14Provide a system for monitoring the health of a Geonetwork instance as well as metrics for some important functions. Metrics will be made available via HTTP/JSON and JMX.  A common useage would be to configure nagios or collectd to collect data from the Geonetwork service and warn administrators when system is becoming unstable.
     15
     16=== Proposal Type ===
     17 * '''Type''': Now Module
     18 * '''App''': !GeoNetwork
     19 * '''Module''':
     20
     21=== Links ===
     22 
     23 * '''Email discussions''':
     24 * '''IRC discussions''':
     25 
     26=== Voting History ===
     27
     28 * None as yet
     29
     30----
     31
     32== Motivations ==
     33At the moment one must make several calls to a Geonetwork instance to ensure that the important functions are running and even that could not detect spurious or difficult to detect instabilities of Geonetwork.  It would be useful to have a consistent way to both register and view such important characteristics like database connection, errors encountered, corrupt index.  Failed logins, etc...
     34
     35== Proposal ==
     36The Metrics library (http://metrics.codahale.com/) by Yammer has excellent support for monitoring the performance and health of a system.  It provides a consistent API for developers to register some vital statistics of an application.  For example in Geonetwork we might want to have a monitor system (like nagios or collectd) check the health of the system which would include checking the database connection, ability to open files, check the index, etc...  In addition we might want to attach a Metrics appended to the logging to track the number of errors being logged and the monitor system would be able to warn of a potentially unstable system based on the number of errors being logged.
     37
     38Metrics has 2 Apis, one for configuring the health checks and another for performing the configured health checks.  The 'out' Apis include JMX and JSON.  For this proposal 4 new servlet mappings will be defined for accessing the monitor information:
     39  - /monitor/metrics?[pretty=(true|false)][class=metric.name] - returns a json response with all of the registered metrics
     40  - /monitor/threads - returns a text representation of the stack dump at the moment of the call
     41  - /monitor/healthcheck - returns 200 if all checks pass or 500 Internal Service Error if one fails (and human readable response of the failures)
     42  - /monitor - provide links to pages listed above.
     43
     44A link will be made from the Admin/config.info page will be made to these servlets so a administrator can easily access this data.  In a future implementation we can possible add a more attractive UI for viewing the information.
     45All /monitor/* urls will be restricted by a Servlet-filter so that only administrators can access the information.
     46
     47It is important to realize that metrics is not exactly the same as statistics in my use case.  While it could be used in some capacity for statistics, in this proposal metrics will be used as a standard API and utilities for creating a monitoring subsystem that is flexible, extensible and can interoperate with many existing monitoring systems.
     48
     49Some monitors I propose to make are:
     50
     51 - Database Health Monitor - checks that the database is accessible
     52 - Index Health Monitor - checks that the Lucene index is searchable
     53 - Index Error Healther Monitor - checks that there are no index errors in index (documents with _indexError field == 1)
     54 - CSW !GetRecords Health Monitor - Checks that GetRecords does not return an error for a basic hits search
     55 - CSW !GetCapabilities Health Monitor - Checks that the GetCapabilities is returned and is not an error document
     56 - Database Access timer - Time taken to access a DBMS instance.  This gives and idea of the level of contention over the database connections
     57 - Database Open Timer - Tracks the length of time a Database access is kept open
     58 - Database Connection Counter - Counts the number of open Database connections
     59 - Harvester Error Counter - Tracks errors that are raised during harvesting
     60 - Service  timer - Track the time of service execution
     61 - Gui Services timer - Track the time of spend executing Gui services
     62 - XSL output timer - Track the time of output xsl transform
     63 - Log4j integration - monitors the frequency that logs are made for each log level so (for example) the rate that error are logged can be monitored.  See http://metrics.codahale.com/manual/log4j
     64 - Webapp integration - monitors number of active requests, error codes returned and length of time requests take. See http://metrics.codahale.com/manual/webapps/
     65
     66The Metrics and !HealthService Monitors will be registered in the !ServletContext so multiple Geonetwork instances can exist in the same webapplication without interfering with each other.
     67
     68See below for an example of the JSON data accessible for the metrics
     69=== Backwards Compatibility Issues ===
     70
     71A new dependency and new servlet and filter definitions in web.xml.  Monitor Manager is added to ServiceContext, ResourceManager and ServiceManager.
     72
     73== Risks ==
     74
     75Nothing notable
     76
     77== Participants ==
     78 * As above