wiki:persistence

WORK IN PROGRESS


Proposal number : Persistence framework

Date 2008/06/19
Contact(s) etj
Last edited Timestamp
Status draft
Assigned to release to be determined
Resources ???

Overview

Suggestions for using a persistence framework.

Proposal Type

  • Type: GUI Change, Core Change, Module Change, Guideline and project governance procedures, ...
  • App: GeoNetwork
  • Module: Data Manager, DB access

Voting History

  • No vote requested yet.

Motivations

Snip from the aforementioned email discussion:

As of version 2.2.0 the GeoNetwork application cannot be deployed to a cluster. Existing deployments probably haven't gotten to the size where clustering is necessary, but if this were to happen, deployment to a cluster will fail.

There are several reasons for this. Firstly, the application is storing non-Serializable objects in the HttpSession. Not terribly difficult to fix but is still a show stopper.

Secondly, and this is the real killer, the current mechanism of generating unique primary keys in jeeves.util.SerialFactory will fail in a cluster due to duplicate primary keys. The SerialFactory caches the max primary key values for each table. In a cluster multiple SerialFactory instances will exist and are oblivious of each other. The first node to insert a record will succeed, other nodes will fail.

Geoscience Australia has deployed GeoNetwork using Oracle. The correct way to deal with this in Oracle is to use a SEQUENCE. This requires generating Oracle specific SQL, something the project has avoided doing.

In my humble opinion, if GeoNetwork is to achieve its full potential it needs to be scalable. Issues like in memory key generation prevent this from occurring. The bottom line is you need to need to be DB independent but scalable. The project should seriously consider the adoption of a persistence framework such as Hibernate.

We'll also have to get independent from the spatial dbms used, so a persistence framework with spatial capabilities would be the better choice.

Proposal

The suggested framework is Hibernate Spatial. ...etc

Backwards Compatibility Issues

Risks

  • It has been reported (aaime) that H does use its cache a lot. When a search gives an high number of results, the cache could generate an OutOfMemory error. This may be avoided using directQueries, which somehow don't use cache. It's a good solutions for read-only queries (and catalog queries are like that). This kind of queries may have drawbacks in terms of unusable lazy loads (an internal H feature), and this could lead to potential problems with an ebRIM based schema, because of the high number of related objects. This issue has been reported on an old H version (about start of 2007), so it may not be valid any longer.

Participants

  • ETj
  • Some ideas and discussions with A Aime, S Giannecchini, A Fabiani.
Last modified 16 years ago Last modified on Jun 19, 2008, 5:18:36 PM
Note: See TracWiki for help on using the wiki.