Changes between Version 13 and Version 14 of MimeTypeCalculationIndexing


Ignore:
Timestamp:
Apr 16, 2010, 4:54:20 AM (14 years ago)
Author:
simonp
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MimeTypeCalculationIndexing

    v13 v14  
    1010== Overview ==
    1111
    12 !GeoNetwork uses different code in a few places to calculate mime types for files that are uploaded with metadata records as online resources (usually based on filenames). However the mime type is never indexed with the metadata record that points to the resource and the mime type calculation is usually done based on the filename so may not reflect the true content of the file or return the correct registered mime type for alternatives. This means that searches cannot be done on the content of online resources.
     12!GeoNetwork uses different code in a few places to calculate mime types for files that are uploaded with metadata records as online resources (usually based on filenames). However the mime type is not indexed with the metadata record that points to the resource and the mime type calculation is usually done based on the filename so may not reflect the true content of the file or return the correct registered mime type for alternatives. This means that searches cannot be done on the content of online resources.
    1313
    1414=== Proposal Type ===
     
    2727== Motivations ==
    2828
    29 The motivation for this proposal is to get the mime type into the Lucene index for use by that do searches via CSW to determine whether !GeoNetwork has metadata records with attached online resources that could be of interest. The particular use case this was developed for is a data visualization program which wanted to search the !GeoNetwork catalog for files of interest.
     29The motivation for this proposal is to get the mime type into the Lucene index for use by applications that do searches via CSW/etc to determine whether !GeoNetwork has metadata records with attached online resources that could be of interest. The particular use case this was developed for is a data visualization program which wanted to search the !GeoNetwork catalog for files of interest.
    3030
    3131== Proposal ==
     
    3535 * Mime type calculation for online resources (gmd:protocol fields that start with WWW:DOWNLOAD or WWW:LINK - others can be added if required by individual sites) using [http://mime-util.sourceforge.net mime-util] immediately after a metadata record is saved/imported in update-fixed-info.xsl.
    3636 * Calculated mime type is stored in metadata record as gmx:MimeFileType child of gmd:name (replaces gco:CharacterString) and will look like the following example:
    37 {{
     37{{{
    3838                                        <gmd:onLine>
    3939                                                <gmd:CI_OnlineResource>
     
    5252                                                </gmd:CI_OnlineResource>
    5353                                        </gmd:onLine>
    54 }}
     54}}}
    5555 * Indexing of the mime type (from the type attribute of gmx:MimeFileType) in Lucene by index-fields.xsl
    5656 * Inclusion of the mime type Lucene field as an !AdditionalQueryable in the CSW config.
     
    6969== Risks ==
    7070
    71 The update-fixed-info.xsl calls Java objects in src/org/fao/geonet/util/MimeTypeFinder.java to do the mime-util based calculation. This may slow down indexing of records with attached online resources - haven't noticed much of a slow down in the 3 months or so this has been in the BlueNetMEST branch.
     71The update-fixed-info.xsl calls Java objects in src/org/fao/geonet/util/MimeTypeFinder.java to do the mime-util based calculation. This may slow down indexing of records with attached online resources but this hasn't been noticed in the 3 months or so that this function has been in the BlueNetMEST branch. (Francois has suggested a framework that when implemented would remove these tasks to a background thread).
    7272
    7373== Participants ==
    7474
    7575 * CSIRO: Gary Carroll and Uwe Rosebrock
     76 * Thanks to Steve Richard (AZGS) and Francois for suggesting gmx:MimeFileType and especially to Francois for adding gmx: support as part of the gco:CharacterString substitution proposal.
    7677