Opened 12 years ago
Last modified 12 years ago
#1127 new defect
recommendations on optimizing lucene indexes needed
Reported by: | plcking | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | v2.10.0 RC0 |
Component: | General | Version: | v2.6.4 |
Keywords: | Cc: |
Description
I have 400K records (ISO19115) loaded into Geonetwork. I have implemented all the recommendations in http://geonetwork-opensource.org/manuals/trunk/users/admin/advanced-configuration/index.html. I am finding that the following CSW search(below) is taking longer than 60 seconds at times. I am assuming that optimizing the lucene indexes will help, but I am worried about the amount of time that optimizing will consume. Should I used lukeall-1.0.1.jar OR is there a way to call a servlet directly (say, using wget) to invoke the optimization ? Can I make CSW calls with a reasonable response time during the optimization ?
<?xml version="1.0" encoding="UTF-8"?> <csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" service="CSW" version="2.0.2" resultType="hits" outputSchema="http://www.iso tc211.org/2005/gmd" maxRecords="10">
<csw:Query typeNames="dataset,application,datasetcollection,service">
<csw:ElementSetName>summary</csw:ElementSetName> <csw:Constraint version="1.1.0">
<ogc:Filter xmlns:ogc="http://www.opengis.net/ogc" xmlns:gml="http://www.opengis.net/gml">
<ogc:And>
<ogc:PropertyIsLike wildCard="*" singleChar="#" escapeChar="\">
<ogc:PropertyName>dc:title</ogc:PropertyName>
<ogc:Literal>%Radarsat-2%</ogc:Literal>
</ogc:PropertyIsLike> <ogc:PropertyIsGreaterThanOrEqualTo>
<ogc:PropertyName>TempExtent_begin</ogc:PropertyName> <ogc:Literal>2012-10-01t00:00:00Z</ogc:Literal>
</ogc:PropertyIsGreaterThanOrEqualTo> <ogc:PropertyIsLessThanOrEqualTo>
<ogc:PropertyName>TempExtent_end</ogc:PropertyName> <ogc:Literal>2012-10-31t07:30:00Z</ogc:Literal>
</ogc:PropertyIsLessThanOrEqualTo>
</ogc:And>
</ogc:Filter>
</csw:Constraint>
</csw:Query>
</csw:GetRecords>
Change History (7)
comment:1 by , 12 years ago
follow-up: 3 comment:2 by , 12 years ago
I agree with Heikki, you should first check were is your slowness. Also using the same approach as in http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch to CSW service could probably help (but maybe not that easy if you need ISO19139 as output).
comment:3 by , 12 years ago
Replying to fxp:
I agree with Heikki, you should first check were is your slowness. Also using the same approach as in http://trac.osgeo.org/geonetwork/wiki/proposals/LuceneOnlySearch to CSW service could probably help (but maybe not that easy if you need ISO19139 as output).
I'm in the process of installing YourKit now....Pat
comment:4 by , 12 years ago
It looks like Lucene indexing search routines are the culprit. I'm assuming that you have YourKit installed, so I would like to send you cpu and memory snapshots of the query. I can't upload the snapshots to this site due to file size(> 4Mb.), so do you have an alternative ?
Pat
comment:6 by , 12 years ago
Replying to jesseeichar:
Do you have a ftp server we can download them from?
Please try the following URL's :
http://ceocat.ccrs.nrcan.gc.ca/memory.snapshot http://ceocat.ccrs.nrcan.gc.ca/cpu.snapshot
Pat
comment:7 by , 12 years ago
I have downloaded them. But I am really busy right now so I am not completely sure when I will have time to analyze them. Perhaps you can get the queries and run then with luke and see if what can be done to speed them up?
I would be careful assuming that optimizing will help at all, rather than make things worse (see http://www.searchworkings.org/blog/-/blogs/simon-says%3A-optimize-is-bad-for-you).
Maybe we should consider moving to Lucene 4.0 (see e.g. http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html).
It may also be helpful if you use an analyzer like YourKit first, to determine if your slowness really comes from Lucene.