Opened 13 years ago
Closed 13 years ago
#623 closed defect (fixed)
Implement NullCheck operator in CSW
Reported by: | josegar74 | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | v2.6.5 |
Component: | Metadata standards | Version: | v2.6.3 |
Keywords: | Cc: |
Description
NullCheck operator is mandatory in INSPIRE, but GeoNetwork although it returns the operator in the Capabilities document, it's not implemented.
Search for null/empty values with lucene is not that easy. A solution that seem working well is index these values as a predefined dummy string as proposed in some forums related to Lucene. Seem next versions of Lucene will manage null values better, so will need some changes when update the version in GeoNetwork.
The solution will work like this when using PropertyIsNull in CSW queries:
<PropertyIsNull> <PropertyName>Abstract</PropertyName> </PropertyIsNull>
is translated to lucene like (suppose dummy value = ZZZZZZZZZZZZZZZ):
abstract: ZZZZZZZZZZZZZZZ
A limitation is for sure if the "dummy" string is meaningful in any document, but selecting it with a non "usual" value should be fine.
Will try to provide a patch later to test, but any comment in this solution is very welcome.
Attachments (1)
Change History (9)
comment:1 by , 13 years ago
comment:2 by , 13 years ago
Instead of using a dummy value in the actual lucene index fields and to avoid any side effect in xsl/java code that uses lucene index values, implementing this solution:
If a metadata field is empty is indexed a field named fieldName_Null
with the value "yes". Searches for null values in the field become:
fieldName_Null: yes
This way, no changes to actual lucene index fields content.
Checked also that this way doesn't duplicate the number of indexed fields in Lucene: when the field fieldName_Null
is indexed, fieldName
field is not indexed and the other way around.
by , 13 years ago
Attachment: | #623__Implement_NullCheck_operator_in_CSW_For_Branch.patch added |
---|
Patch to review for branch code
comment:3 by , 13 years ago
Maybe the last suggestion in http://www.gossamer-threads.com/lists/lucene/java-dev/64663 is worth a try?
comment:4 by , 13 years ago
Hi Simon
Changed to:
<xsl:template match="ogc:PropertyIsNull"> <BooleanQuery> <BooleanClause required="true" prohibited="false"> <MatchAllDocsQuery required="true" prohibited="false"/> </BooleanClause> <BooleanClause required="false" prohibited="true"> <RangeQuery fld="{ogc:PropertyName}" lowerTxt="*" upperTxt="*" inclusive="true"/> </BooleanClause> </BooleanQuery> </xsl:template>
The query is translated to:
+(+*:* -abstract:[ TO ]) +_isTemplate:n
One problem with this solution is * is removed by analyzer. But also trying this query in Luke:
+(+*:* -abstract:[* TO *])
but get all results. In the link about lucene, there's a comment: although i can't remember if *:* is a Solr extension of part of hte core QueryParser
So no clear if this should work.
Anyway if you find any bug in the new expression that prevents it to work properly i can change it (not an expert in Lucene). Otherwise, I'll go for patch proposed and later if any simple solution we can update
comment:5 by , 13 years ago
Jose,
A variation on that email seems to work for me: In my lucene index I have 31 documents, 2 documents have a value indexed in field altTitle (alternate title).
If using Luke (1.0.1) I enter a query in the search tab of:
*:* -altTitle:*
I get the 29 documents that don't have a value indexed for altTitle but this requires 'Allow leading * in wildcard queries' to be checked in the queryparser settings of the search tab on Luke so this will require some additional work to make sure GeoNetwork lucene calls can do this too.....?
comment:6 by , 13 years ago
Actually the following in csw/filter-to-lucene.xsl:
<xsl:template match="ogc:PropertyIsNull"> <BooleanQuery> <BooleanClause required="true" prohibited="false"> <MatchAllDocsQuery required="true" prohibited="false"/> </BooleanClause> <BooleanClause required="false" prohibited="true"> <WildcardQuery fld="{ogc:PropertyName}" txt="*"/> </BooleanClause> </BooleanQuery> </xsl:template>
and a CSW query with:
<PropertyIsNull> <PropertyName>altTitle</PropertyName> </PropertyIsNull>
Generates a lucene query in GeoNetwork like the one in Luke:
Lucene query: +(+(+*:* -altTitle:*) +_isTemplate:n) +(_op0:2 _op0:1 _op0:0 _op0:-1 _owner:1)
which seems to work ie. returns all the records that don't have an altTitle field indexed.
comment:7 by , 13 years ago
Tried also and works for me. Going to commit.
Many thanks Simon, you're great!
comment:8 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
It sounds OK to me, but 2 remarks :