Opened 13 years ago

Closed 13 years ago

#623 closed defect (fixed)

Implement NullCheck operator in CSW

Reported by: josegar74 Owned by: geonetwork-devel@…
Priority: major Milestone: v2.6.5
Component: Metadata standards Version: v2.6.3
Keywords: Cc:

Description

NullCheck operator is mandatory in INSPIRE, but GeoNetwork although it returns the operator in the Capabilities document, it's not implemented.

Search for null/empty values with lucene is not that easy. A solution that seem working well is index these values as a predefined dummy string as proposed in some forums related to Lucene. Seem next versions of Lucene will manage null values better, so will need some changes when update the version in GeoNetwork.

The solution will work like this when using PropertyIsNull in CSW queries:

<PropertyIsNull>
  <PropertyName>Abstract</PropertyName>
</PropertyIsNull>

is translated to lucene like (suppose dummy value = ZZZZZZZZZZZZZZZ):

abstract: ZZZZZZZZZZZZZZZ

A limitation is for sure if the "dummy" string is meaningful in any document, but selecting it with a non "usual" value should be fine.

Will try to provide a patch later to test, but any comment in this solution is very welcome.

Attachments (1)

#623__Implement_NullCheck_operator_in_CSW_For_Branch.patch (41.0 KB ) - added by josegar74 13 years ago.
Patch to review for branch code

Download all attachments as: .zip

Change History (9)

comment:1 by heikki, 13 years ago

It sounds OK to me, but 2 remarks :

  • maybe use an even less common dummy value than ZZZZZZZZZZ, for example "dsfgerg3453fdsgdfgdf" ;
  • I think you should make sure that these values are filtered / transformed away whenever the metadata is viewed, edited, exported, retrieved by CSW, etc.

comment:2 by josegar74, 13 years ago

Instead of using a dummy value in the actual lucene index fields and to avoid any side effect in xsl/java code that uses lucene index values, implementing this solution:

If a metadata field is empty is indexed a field named fieldName_Null with the value "yes". Searches for null values in the field become:

fieldName_Null: yes

This way, no changes to actual lucene index fields content.

Checked also that this way doesn't duplicate the number of indexed fields in Lucene: when the field fieldName_Null is indexed, fieldName field is not indexed and the other way around.

by josegar74, 13 years ago

Patch to review for branch code

comment:3 by simonp, 13 years ago

Maybe the last suggestion in http://www.gossamer-threads.com/lists/lucene/java-dev/64663 is worth a try?

comment:4 by josegar74, 13 years ago

Hi Simon

Changed to:

<xsl:template match="ogc:PropertyIsNull">
        <BooleanQuery>
			<BooleanClause required="true" prohibited="false">
				<MatchAllDocsQuery required="true" prohibited="false"/>
			</BooleanClause>

            <BooleanClause required="false" prohibited="true">
                <RangeQuery fld="{ogc:PropertyName}" lowerTxt="*" upperTxt="*" inclusive="true"/>
            </BooleanClause>
		</BooleanQuery>

</xsl:template>

The query is translated to:

+(+*:* -abstract:[ TO ]) +_isTemplate:n

One problem with this solution is * is removed by analyzer. But also trying this query in Luke:

+(+*:* -abstract:[* TO *])

but get all results. In the link about lucene, there's a comment: although i can't remember if *:* is a Solr extension of part of hte core QueryParser

So no clear if this should work.

Anyway if you find any bug in the new expression that prevents it to work properly i can change it (not an expert in Lucene). Otherwise, I'll go for patch proposed and later if any simple solution we can update

comment:5 by simonp, 13 years ago

Jose,

A variation on that email seems to work for me: In my lucene index I have 31 documents, 2 documents have a value indexed in field altTitle (alternate title).

If using Luke (1.0.1) I enter a query in the search tab of:

*:* -altTitle:*

I get the 29 documents that don't have a value indexed for altTitle but this requires 'Allow leading * in wildcard queries' to be checked in the queryparser settings of the search tab on Luke so this will require some additional work to make sure GeoNetwork lucene calls can do this too.....?

comment:6 by simonp, 13 years ago

Actually the following in csw/filter-to-lucene.xsl:

<xsl:template match="ogc:PropertyIsNull">
    <BooleanQuery>
      <BooleanClause required="true" prohibited="false">
        <MatchAllDocsQuery required="true" prohibited="false"/>
      </BooleanClause>

      <BooleanClause required="false" prohibited="true">
        <WildcardQuery fld="{ogc:PropertyName}" txt="*"/>
      </BooleanClause>
    </BooleanQuery>
</xsl:template>

and a CSW query with:

<PropertyIsNull>
  <PropertyName>altTitle</PropertyName>
</PropertyIsNull>

Generates a lucene query in GeoNetwork like the one in Luke:

Lucene query: +(+(+*:* -altTitle:*) +_isTemplate:n) +(_op0:2 _op0:1 _op0:0 _op0:-1 _owner:1)

which seems to work ie. returns all the records that don't have an altTitle field indexed.

comment:7 by josegar74, 13 years ago

Tried also and works for me. Going to commit.

Many thanks Simon, you're great!

comment:8 by josegar74, 13 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.