Version 13 (modified by 15 years ago) ( diff ) | ,
---|
Mime Type Calculation and Indexing
Date | 2010/04/14 |
Contact(s) | Simon Pigot |
Last edited | Timestamp |
Status | Complete |
Assigned to release | 2.5.0 |
Resources | Available |
Overview
GeoNetwork uses different code in a few places to calculate mime types for files that are uploaded with metadata records as online resources (usually based on filenames). However the mime type is never indexed with the metadata record that points to the resource and the mime type calculation is usually done based on the filename so may not reflect the true content of the file or return the correct registered mime type for alternatives. This means that searches cannot be done on the content of online resources.
Proposal Type
- Sandbox: BlueNetMEST
- App: GeoNetwork
- Module: Lucene Index, Metadata schemas
Links
- Documents: mime-util
Voting History
- Vote proposed.
Motivations
The motivation for this proposal is to get the mime type into the Lucene index for use by that do searches via CSW to determine whether GeoNetwork has metadata records with attached online resources that could be of interest. The particular use case this was developed for is a data visualization program which wanted to search the GeoNetwork catalog for files of interest.
Proposal
This proposal implements:
- Mime type calculation for online resources (gmd:protocol fields that start with WWW:DOWNLOAD or WWW:LINK - others can be added if required by individual sites) using mime-util immediately after a metadata record is saved/imported in update-fixed-info.xsl.
- Calculated mime type is stored in metadata record as gmx:MimeFileType child of gmd:name (replaces gco:CharacterString) and will look like the following example:
{{
<gmd:onLine>
<gmd:CI_OnlineResource>
<gmd:linkage>
<gmd:URL>http://localhost:8080/geonetwork/srv/en/file.disclaimer?id=10&fname=basins.zip&access=private</gmd:URL>
</gmd:linkage> <gmd:protocol>
<gco:CharacterString>WWW:DOWNLOAD-1.0-http--download</gco:CharacterString>
</gmd:protocol> <gmd:name xmlns:gmx="http://www.isotc211.org/2005/gmx" xmlns:srv="http://www.isotc211.org/2005/srv">
<gmx:MimeFileType type="application/x-zip">basins.zip</gmx:MimeFileType>
</gmd:name> <gmd:description>
<gco:CharacterString>Hydrological basins in Africa (Shapefile Format)</gco:CharacterString>
</gmd:description>
</gmd:CI_OnlineResource>
</gmd:onLine>
}}
- Indexing of the mime type (from the type attribute of gmx:MimeFileType) in Lucene by index-fields.xsl
- Inclusion of the mime type Lucene field as an AdditionalQueryable in the CSW config.
Mime-util has a plugin architecture to support the addition of new ways to calculate mime types plus a simple file where unregistered types can be added.
At this stage the proposal does not:
- include adding a search field in the advanced search interface
- replace mime type calculations done elsewhere (specifically in Jeeves src/jeeves/util/BinaryFile.java) in GeoNetwork with mime-util code
These can be done at a later date.
Note that the patch file attached to this proposal includes some enhancements to the Lucene Index Reader provider code in SearchManager.java, a nicer file download dialog (the trunk is using file.download service but those that don't want that can switch back to resources.get by editing update-fixed-info.xsl for their schema) and the temporal extent search proposal.
Risks
The update-fixed-info.xsl calls Java objects in src/org/fao/geonet/util/MimeTypeFinder.java to do the mime-util based calculation. This may slow down indexing of records with attached online resources - haven't noticed much of a slow down in the 3 months or so this has been in the BlueNetMEST branch.
Participants
- CSIRO: Gary Carroll and Uwe Rosebrock
Attachments (1)
- patch.TemporalExtentSearchAndMimeTypeCalculation (287.5 KB ) - added by 15 years ago.
Download all attachments as: .zip