Opened 15 years ago

Last modified 12 years ago

#165 reopened defect

"unable to open resource main-db" returned by metadata.show and xml.iso19139 services - stalls server?

Reported by: simonp Owned by: geonetwork-devel@…
Priority: major Milestone: v2.6.5
Component: General Version: v2.6.3
Keywords: unable to open resource main-db Cc:

Description

Reported by Heikki Doeleman:

From: James Wilson [james.q.wilson@…] Sent: Wednesday, 21 October 2009 3:02 AM To: geonetwork-devel@… Subject: Re: [GeoNetwork-devel] "unable to open resource main-db"

Heikki,

I'm also seeing this problem after doing repeated saves. I'm using oracle, with pool size = 10 and reconnectTime=undefined. Problem is not easily repeatable, but happens regularly.

Is this a problem with the connection pool running out of objects?

Reading the code, I'm somewhat confused as to how the connection pool is supposed to work, and resources get released. By my reading, the only place connections get released is by calling close on connection pool object (DbmsPool) with the resource (Dbms) as an argument, which unlocks the resource.

However, in the code, this never seems to be done. The default pattern seems to be to get a new resource with

dbms = context.getResourceManager().open(Geonet.Res.MAIN_DB)

and then to leave the resource to go out of scope

Yours, mystified, and frustrated at not being able to track down the error....

Anybody else got any ideas?

James

Heikki Doeleman wrote:

hello lists,

I'm seeing this error, every so often :

"unable to open resource main-db after 20attempts"

This happens when you're using the editor for a bit. When it happens, you must either restart GeoNetwork or wait a while (session timeout?).

It started to happen for us after moving to Postgres and moving to a GN 2.4-based application.

A search on Google shows that quite lots of GeoNetwork nodes have it. Moreover, from a WHO node that displays a stacktrace, it seems they have it when using MySQL, not Postgres ( http://209.85.229.132/search?q=cache:5Z0sxBf6fS8J:apps.who.int/geonetwork/srv/en/metadata.show%3Fid%3D74%26currTab%3Dsimple+%22unable+to+open+resource+main-db%22+%22Communications+link+failure+due+to+underlying+exception:%22&cd=1&hl=en&ct=clnk ).

Does anybody know what to do to prevent this from ever happening ? My JDBC settings in config.xml are poolSize=10, reconnectTime=undefined (defaulting to Jeeves).

Thank you so much

Kind regards Heikki Doeleman

Change History (10)

comment:1 by simonp, 15 years ago

Looking at Heikki's original message and references to stack traces in google - its actually taking place in the services metadata.show and iso19139.xml. I can reproduce it with curl and thumping the server with lots of requests to the metadata.show service.

Seems that some code, somewhere was opening a Dbms resource but wasn't going through the usual Jeeves clean up of Dbms resources (via commit on success or rollback on exception) at the end of a service, hence the 10 Dbms connections in the pool would eventually end up locked and an exception with message 'unable to open main-db after 20 attempts' would be given. The code appears to be the timertask thread that is used to increase popularity asynchronously in metadata.show (its also called by iso19139.xml). One fix is to add:

context.getResourceManager().close();

as the last line of the try block in the run method of the IncreasePopularityTask class in src/org/fao/geonet/services/metadata/Show.java

I'm not sure this is good though as when thumping the server (albeit with a somewhat unrealistic load) as described above, the thread that commits the increase in popularity tends to overlap with other threads that attempt to select from the Metadata table - the select threads fail because of pending commits. Safest might be to remove the async popularity update task and just run dm.increasePopularity(dbms, id); synchronously (no exceptions there)?

Cheers, Simon

comment:2 by simonp, 15 years ago

context.getResourceManager().close(); added in commit 5384

comment:3 by simonp, 15 years ago

Resolution: fixed
Status: newclosed

comment:4 by yecarrillo, 13 years ago

Keywords: unable to open resource main-db added
Milestone: v2.4.3v2.7.0
Resolution: fixed
Status: closedreopened
Version: v2.5.0v2.6.3

comment:5 by simonp, 13 years ago

The problem originally was caused by using a Dbms connection in a thread to increase popularity of the metadata (this was a perf improvement) but the connection was left hanging or with the fix in place caused other issues. In 2.6.x the threaded behaviour was removed and the problem should have disappeared as the Dbms connection is handled in the usual Jeeves manner. Did you actually have a problem or did you just notice that the original fix was removed?

In 2.7 the threaded behaviour has been restored as code to open a specific database connection and close only that connection has been implemented. Prior to 2.7 the ResourceManager-ResourceProvider-DbmsPool-Dbms class hierarchy and relationships did not contain the methods to do this. These classes have been reworked and if you check the code for 2.7 (in particular DataManager.java, the internal class IncreasePopularityTask) you will see this. This is also used in other situations now where a thread needs to open, maintain and close a database connection outside of the usual Jeeves way of doing this.

comment:6 by yecarrillo, 13 years ago

The problem persists in 2.6.x even with the Jeeves managing all connections. I have a lot of "unable to open resource main-db" errors in logs.

At this time I have no manner to test 2.7.x, but I think this bug still being a problem when more than 5 users are logged and editing metadata.

comment:7 by yecarrillo, 13 years ago

bump. Lot of "unable to open resource main-db" in my logs. Some hint?

comment:8 by simonp, 13 years ago

Can you try a patch I developed for 2.6.4 which replaces the Jeeves database pool code with the apache database connection pool code (dbcp) from 2.7? I have supplied this patch to others who were also having problems with the old jeeves database pool code in 2.6.4.

comment:9 by ianwallen, 12 years ago

Milestone: v2.7.0v2.8.0

comment:10 by simonp, 12 years ago

Milestone: v2.8.0v2.6.5
Note: See TracTickets for help on using tickets.