wiki:proposals/LuceneOnlySearch

Version 5 (modified by Fxp, 12 years ago) ( diff )

--

Lucene-Only search mode

Date 2011-10-31
Contact(s) François Prunayre
Last edited 2011-10-31T18:25:00
Status Being discussed, in progress, initial implementation in progress
Assigned to release 2.7.x
Resources Available
Ticket # #652

Overview

Current search rely on the following steps:

  • Lucene Search
  • Retrieve Metadata from Database (according to search criteria and paging)
  • XSL presentation.

The aim of this proposal is to add a new search mode relying only on Lucene. This mode requires to structure and/or add all information required for results presentation in the index.

The Lucene only search is available through the "q" service which return XML response. The widgets interface could be configured to use this service instead of xml.search (ie. classic user interface will not change).

Proposal Type

  • Type: Lucene, Core Change
  • App: GeoNetwork
  • Module: LuceneSearcher, Widgets

Voting History

  • Vote not yet proposed.

Motivations

  • Performance improvements: search could be 10 to 20 time faster and better support concurrent users.

The following charts compare 3 services:

  • xml.search in default mode (ie with DB, with XSL)
  • xml.search in fast mode (ie. with DB, no XSL)
  • q service (#485) which dump all index fields (ie. no DB, no XSL)

Number of concurrent users increasing

number of records returned per page

Proposal

The main requirement is to store all information available in the "brief" format in the index in order to retrieve those information at search time. Brief format is the pivot format used by GeoNetwork to display search results via xml.search or main.search.embeded services. Brief format fields are the following:

  • id
  • uuid
  • title
  • abstract
  • keyword
  • parentId
  • datasetcreationdate
  • geoBox
    • westBL
    • eastBL
    • southBL
    • northBL
  • Constraints (not supported/used in widget GUI, complex XML or as CData ?)
  • SecurityConstraints (not supported/used in widget GUI, complex XML or as CData ?)
  • LegalConstraints (not supported/used in widget GUI, complex XML or as CData ?)
  • temporalExtent (not supported/used in widget GUI)
    • begin
    • end
  • image type="unknown|thumbnail|overview"
  • responsibleParty role="{$role}" appliesTo="resource" logo=""
  • link title="" href="" name="" protocol="" type=""
  • category
  • + geonet:info/*

The q service (#485) requires to retrieve those information from the index directly instead of dumping all index fields. Some of them are already available:

  • id=_id
  • schema=_schema
  • createDate=_createDate
  • changeDate=_changeDate
  • isTemplate=_isTemplate
  • isHarvested=_isHarvested
  • popularity=_popularity
  • rating=_rating
  • displayOrder=_displayOrder
  • view=_view
  • notify=_notify
  • download=_download
  • dynamic=_dynamic
  • featured=_featured
  • owner=_owner
  • isPublishedToAll=_isPublishedToAll
  • ownername=_ownername
  • category=_category
  • valid=_valid
  • baseUrl=?
  • locService=?
  • selected=selected
  • source=_source
  • edit=_edit
  • uuid=_uuid
  • title=title
  • abstract=abstract
  • keyword=keyword
  • parentId=parentUuid
  • datasetcreationdate=createDate
  • changeDate=changeDate

Re-worked field:

  • image=image (| separated)
  • link=link (| separated)
  • geoBox (| separated)
  • responsibleParty (| separated)
  • Constraints=accessConstr (Codelist value only)
  • Constraints=otherConstr
  • SecurityConstraints=classif (Codelist value only)
  • Constraints=conditionApplyingToAccessAndUse

New field added:

  • datasetLang=datasetLang
  • language=language
  • spatialRepresentationType=spatialRepresentationType
  • serviceType=serviceType

Backwards Compatibility Issues

None.

New libraries added

None.

Further improvements

  • Support multilingual metadata: classic mechanism to use GUI language and fallback to main language is not implemented for this service. It should be part of the multilingual metadata indexing proposal (http://trac.osgeo.org/geonetwork/wiki/MultilingualIndexMechanism) and require more work.
  • Use it for CSW search when complete ISO record is not needed (eg. dublin-core output)

Risks

Participants

  • François Prunayre
  • Others?

Attachments (3)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.