wiki:FastIndexUpdate

Version 2 (modified by jesseeichar, 11 years ago) ( diff )

--

Fast Index Update

Date 2012/09/14
Contact(s) Jesse Eichar
Last edited
Status Not proposed. Just brain storm
Assigned to release 2.9.x
Resources Swisstopo
Code https://github.com/jesseeichar/core-geonetwork/tree/improvement/fastupdate

Overview

Currently in order to change any fields in the lucene index the metadata document has to be completely reindexed. This proposal outlines how certain field changes can be efficiently performed.

The types of actions this would make efficient are:

  • Update popularity
  • Update privileges

Proposal Type

  • Type: Improvement
  • App: Geonetwork
  • Module: Index

  • Email discussions:
  • IRC discussions:
  • Related work:

Voting History

  • None as yet

Proposal

Background

First of all lucene does not support updating a document directly. The normal way of updating a lucene document is to:

  1. Recreate the lucene document from the source data (in our case the metadata xml)
  2. delete the document from the index
  3. add new (recreated) document to the index

If the recreation of the lucene document is an expensive operation then performing this task is naturally expensive. There are work arounds for example:

  1. Load document from index
  2. Modify document
  3. delete document from index
  4. insert modified document to index

However this is a naive solution and fails in practice because there are both stored and non-stored fields. Using this strategy will lose the non-stored fields.

A potential solution comes to mind:

  • Create 2 documents from metadata xml
    • a document only containing fields with stored data
    • a document containing fields only with non-stored data
  • Insert both documents into index

Now the document with stored fields can be updated without losing any data.

Obviously this impacts how searching is done. First it means that searches that involve both stored and non-stored fields need to be split and recombined.

For example &(any:water title:salt)

if title is stored and any is not then two searches have to be combined and documents that are in both searches (based on metadata id) are the true results. Ordering of results will be very difficult.

Backwards Compatibility Issues

Will require rebuild of index

Risks

Participants

  • As above
Note: See TracWiki for help on using the wiki.