Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#1258 closed defect (fixed)

Healthcheck using wrong query for database healthcheck

Reported by: ianwallen Owned by: jesseeichar
Priority: major Milestone: v2.8.1
Component: General Version: v2.8.0
Keywords: Cc:

Description

As pointed out by Montse in http://osgeo-org.1560.n6.nabble.com/Oracle-Database-health-check-issues-td5044740.html

DatabaseHealthCheck.java is using query "SELECT id from Metadata LIMIT 1" which is not compatible with other databases.

As this is a simple check to see if the database is responding, I believe it should be using the query that was identified in the config.xml "validationQuery" as this query should be compatible with the database.

Change History (10)

comment:1 by ianwallen, 11 years ago

Summary: Helthcheck using wrong query for database healthcheckHealthcheck using wrong query for database healthcheck

comment:2 by drh, 11 years ago

My sense is that the validationQuery is to test the JDBC connection, but the DatabaseHealthCheck should be a query (queries) aimed at checking the status of the database as well as the connection. In 2.8.0, the DatabaseHealthCheck fails on Oracle:

* Csw GetCapabilities: OK
* Csw GetRecords: OK
! Database Connection: ERROR
!  ORA-00933: SQL command not properly ended


java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended

	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440)
	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
	at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:837)
	at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:445)
	at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
	at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)
	at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
	at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:863)
	at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1153)
	at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1275)
	at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3576)
	at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3620)
	at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491)
	at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
	at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
	at jeeves.resources.dbms.Dbms.selectFull(Dbms.java:213)
	at jeeves.resources.dbms.Dbms.select(Dbms.java:176)
	at org.fao.geonet.monitor.health.DatabaseHealthCheck$1.check(DatabaseHealthCheck.java:25)
	at com.yammer.metrics.core.HealthCheck.execute(HealthCheck.java:165)
	at com.yammer.metrics.core.HealthCheckRegistry.runHealthChecks(HealthCheckRegistry.java:53)
	at com.yammer.metrics.reporting.HealthCheckServlet.doGet(HealthCheckServlet.java:61)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
	at com.yammer.metrics.reporting.AdminServlet.doGet(AdminServlet.java:103)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.fao.geonet.monitor.webapp.WebappMetricsFilter.doFilter(WebappMetricsFilter.java:96)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:176)
	at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:145)
	at org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:92)
	at org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:381)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.fao.geonet.monitor.MonitorSecurityFilter.doFilter(MonitorSecurityFilter.java:51)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.fao.geonet.monitor.webapp.MetricsRegistryInitializerFilter.doFilter(MetricsRegistryInitializerFilter.java:31)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
	at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
	at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
	at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:769)
	at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:698)
	at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891)
	at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
	at java.lang.Thread.run(Thread.java:662)

* Lucene Index: OK
* Metadata Index Errors: OK

comment:3 by jesseeichar, 11 years ago

Owner: changed from geonetwork-devel@… to jesseeichar

Often the database configuration has a validation check sql. That should probably be used for this check since it is database specific.

in reply to:  2 comment:4 by jesseeichar, 11 years ago

Do you have some sql suggestions? Should the sql just have a ; ending the statement for oracle?

Replying to drh:

My sense is that the validationQuery is to test the JDBC connection, but the DatabaseHealthCheck should be a query (queries) aimed at checking the status of the database as well as the connection. In 2.8.0, the DatabaseHealthCheck fails on Oracle:

* Csw GetCapabilities: OK
* Csw GetRecords: OK
! Database Connection: ERROR
!  ORA-00933: SQL command not properly ended


java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended

	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440)
	at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
	at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:837)
	at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:445)
	at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
	at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)
	at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
	at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:863)
	at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1153)
	at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1275)
	at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3576)
	at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3620)
	at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491)
	at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
	at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
	at jeeves.resources.dbms.Dbms.selectFull(Dbms.java:213)
	at jeeves.resources.dbms.Dbms.select(Dbms.java:176)
	at org.fao.geonet.monitor.health.DatabaseHealthCheck$1.check(DatabaseHealthCheck.java:25)
	at com.yammer.metrics.core.HealthCheck.execute(HealthCheck.java:165)
	at com.yammer.metrics.core.HealthCheckRegistry.runHealthChecks(HealthCheckRegistry.java:53)
	at com.yammer.metrics.reporting.HealthCheckServlet.doGet(HealthCheckServlet.java:61)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
	at com.yammer.metrics.reporting.AdminServlet.doGet(AdminServlet.java:103)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.fao.geonet.monitor.webapp.WebappMetricsFilter.doFilter(WebappMetricsFilter.java:96)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:176)
	at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:145)
	at org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:92)
	at org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:381)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.fao.geonet.monitor.MonitorSecurityFilter.doFilter(MonitorSecurityFilter.java:51)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.fao.geonet.monitor.webapp.MetricsRegistryInitializerFilter.doFilter(MetricsRegistryInitializerFilter.java:31)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
	at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
	at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
	at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:769)
	at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:698)
	at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:891)
	at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
	at java.lang.Thread.run(Thread.java:662)

* Lucene Index: OK
* Metadata Index Errors: OK

comment:5 by ianwallen, 11 years ago

The ";" is just to signify the end of the statement when using sqlplus or sql developer or similar tools. It cannot be used when supplying a statement via jdbc. So no it is not required.

If converting the following statement to oracle

SELECT id from Metadata LIMIT 1

then it would be

SELECT id from Metadata where rownum < 2

But I'm still not clear what the healtcheck is suppose to do?

This (DatabaseHealthCheck) will ensure that jdbc connection is working and that the database is up.

SELECT 1 from dual

This will ensure that the account that was logged in contains a table called metadata with at least 1 record.

SELECT id from Metadata where rownum < 2

If there is no data in the metadata table then it will return nothing (Which does not mean there is a database or application error). And if the table does not exist it will fail (which does not mean the database is down - just that the table does not exist which most likely means that geonetwork is not available). Also why the metadata table - i.e. the metadata table may exist however one of the other tables (Settings, languages, ....) may not. Seems like we are simply playing statistics - chances are that if the metadata exist then the others will exist as well.

So in the end I believe we are probably better off using the DatabaseHealthCheck query.

But if we want a quick fix for the existing logic then we can probably change the query to

SELECT count(*) from Metadata

As I believe all the databases understand this query. Note: It may not be as fast as the previous query.

comment:6 by jeichar, 11 years ago

Basically the check is to make sure that the system is running and usable. If there is no metadata or the table does not exist, then the system is not usable, although typically the failure of the healthcheck indicates the database is not reachable or something catastrophic has happened.

I wanted a little more than just a simple connection check and I wanted it to be an extremely fast check because in production I have nagios check it every 10 seconds.

I haven't looked into the performance characteristics of database queries to much but I would expect that SELECT count(*) from MEtadata; should be a very query even on very large datasets. Am I right?

comment:7 by ianwallen, 11 years ago

I have geonetwork setup with the metadata table containing 4,000 records on a basic desktop system running oracle (nothing special). It took 1.638 seconds to perform the count(*) and 0.016 seconds to "select 1 from metadata where rownum < 2"

On the same system I have a table with a few column and 30,000,000 records - It took 5.085 seconds to perform a count(*) and 0.062 seconds to get the first row.

The point being - as the amount of data increases the count(*) will degrade. Whether this degradation is acceptable or not is debatable.

Why not choose a table that does not grow with the database. The "settings" table should not change as the database grows so it may be a good candidate. As I mentioned before, if the metadata table is all intact but the settings table is missing, geonetwork will not be functioning. So I think as long a you do a select count(*) from a core table that does not increase as the database increases then you should be ok.

comment:8 by jeichar, 11 years ago

Sounds good. I will use count(*) on another table. I have to take a look.

comment:9 by jeichar, 11 years ago

Resolution: fixed
Status: newclosed

comment:10 by ianwallen, 11 years ago

Commit 8b4db504213e9cf03db6bf9d578d215c192853de in 2.8 branch

Note: See TracTickets for help on using tickets.