Opened 12 years ago

Last modified 12 years ago

#1127 new defect

recommendations on optimizing lucene indexes needed

Reported by: plcking Owned by: geonetwork-devel@…
Priority: major Milestone: v2.10.0 RC0
Component: General Version: v2.6.4
Keywords: Cc:

Description

I have 400K records (ISO19115) loaded into Geonetwork. I have implemented all the recommendations in http://geonetwork-opensource.org/manuals/trunk/users/admin/advanced-configuration/index.html. I am finding that the following CSW search(below) is taking longer than 60 seconds at times. I am assuming that optimizing the lucene indexes will help, but I am worried about the amount of time that optimizing will consume. Should I used lukeall-1.0.1.jar OR is there a way to call a servlet directly (say, using wget) to invoke the optimization ? Can I make CSW calls with a reasonable response time during the optimization ?

<?xml version="1.0" encoding="UTF-8"?> <csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" service="CSW" version="2.0.2" resultType="hits" outputSchema="http://www.iso tc211.org/2005/gmd" maxRecords="10">

<csw:Query typeNames="dataset,application,datasetcollection,service">

<csw:ElementSetName>summary</csw:ElementSetName> <csw:Constraint version="1.1.0">

<ogc:Filter xmlns:ogc="http://www.opengis.net/ogc" xmlns:gml="http://www.opengis.net/gml">

<ogc:And>

<ogc:PropertyIsLike wildCard="*" singleChar="#" escapeChar="\">

<ogc:PropertyName>dc:title</ogc:PropertyName>

<ogc:Literal>%Radarsat-2%</ogc:Literal>

</ogc:PropertyIsLike> <ogc:PropertyIsGreaterThanOrEqualTo>

<ogc:PropertyName>TempExtent_begin</ogc:PropertyName> <ogc:Literal>2012-10-01t00:00:00Z</ogc:Literal>

</ogc:PropertyIsGreaterThanOrEqualTo> <ogc:PropertyIsLessThanOrEqualTo>

<ogc:PropertyName>TempExtent_end</ogc:PropertyName> <ogc:Literal>2012-10-31t07:30:00Z</ogc:Literal>

</ogc:PropertyIsLessThanOrEqualTo>

</ogc:And>

</ogc:Filter>

</csw:Constraint>

</csw:Query>

</csw:GetRecords>

Change History (7)

comment:1 by heikki, 12 years ago

I would be careful assuming that optimizing will help at all, rather than make things worse (see http://www.searchworkings.org/blog/-/blogs/simon-says%3A-optimize-is-bad-for-you).

Maybe we should consider moving to Lucene 4.0 (see e.g. http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html).

It may also be helpful if you use an analyzer like YourKit first, to determine if your slowness really comes from Lucene.

comment:2 by fxp, 12 years ago

I agree with Heikki, you should first check were is your slowness. Also using the same approach as in http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch to CSW service could probably help (but maybe not that easy if you need ISO19139 as output).

in reply to:  2 comment:3 by plcking, 12 years ago

Replying to fxp:

I agree with Heikki, you should first check were is your slowness. Also using the same approach as in http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch to CSW service could probably help (but maybe not that easy if you need ISO19139 as output).

I'm in the process of installing YourKit now....Pat

comment:4 by plcking, 12 years ago

It looks like Lucene indexing search routines are the culprit. I'm assuming that you have YourKit installed, so I would like to send you cpu and memory snapshots of the query. I can't upload the snapshots to this site due to file size(> 4Mb.), so do you have an alternative ?

Pat

comment:5 by jesseeichar, 12 years ago

Do you have a ftp server we can download them from?

in reply to:  5 comment:6 by plcking, 12 years ago

Replying to jesseeichar:

Do you have a ftp server we can download them from?

Please try the following URL's :

http://ceocat.ccrs.nrcan.gc.ca/memory.snapshot http://ceocat.ccrs.nrcan.gc.ca/cpu.snapshot

Pat

comment:7 by jesseeichar, 12 years ago

I have downloaded them. But I am really busy right now so I am not completely sure when I will have time to analyze them. Perhaps you can get the queries and run then with luke and see if what can be done to speed them up?

Note: See TracTickets for help on using tickets.