wiki:Bolsena2010

Version 26 (modified by simonp, 15 years ago) ( diff )

--

Topics for GeoNetwork discussion in Bolsena 2010

Bolsena

author: Heikki Doeleman

For the 3rd time, the Bolsena OSGeo Hacking Event is going to be held. This page lists some ideas that can be discussed within the GeoNetwork faction.

Please add any idea you have here !

Also, it would be nice if we can have a number of presentations, like we did last year. Tell us about your projects, or about some interesting technology or tool, or whatever you want. Please volunteer your presentation proposals on this page too.


discussion topics

  • Javadoc: let's clean up the existing Javadoc and add new where it is missing. It'd be good to familiarize yourself with how Javadoc works, before doing this; e.g. there should be no blank line between the Javadoc and the method it is about; the first sentence should end in a period; and things like that.
  • Automatically add the Javadoc pages to this wiki, updated from a Hudson build process? For all of the branches?
  • Some people really like working with patches; other people prefer using short-lived SVN branches for a similar purpose. Can we all agree on doing it one way or the other?
  • This wiki is a bit of a mess, in my opinion. I think it would be good if we could put maybe 3 people in charge to firstly, clean it up and better structure it; and secondly, to try to keep it that way.
  • Let's create SQL files that can fill an empty GeoNetwork database with only the minimum needed to run the program? The admin user, settings, regions, things like that. Not everyone is really eager to use GAST for anything.
  • Can we release GeoNetwork 3.0 (with the CSW/ebRIM interface)? Maybe we can have simultaneous "current releases" in both the GN2.x and GN3.x lineages, as do for example Lucene and Tomcat?
  • Does anyone like the function of the installer that it overwrites your JDBC credentials with randomly generated values? I certainly don't, as my DB lives very much longer than the many GeoNetwork installations I always do, so I have to edit config.xml everytime. How's about removing that?
  • Would it be an idea to appoint Language Managers for each of the supported translations? They would form the International Internationalization Committee (IIC, or CII in French) and they're summoned to maintain the i18n files for their language, before each new release. This might even be arranged in an OSGEO-wide manner.
  • The class DataManager.java and its sister XMLSerializer.java are in particularly bad shape, in my opinion. There are literally dozens of public methods that all do more or less the same thing. Of course it's not clearly documented why they are all there or when to use which. Would it be too drastic to propose that we keep 1 single public method for each of the functions createMetadata, updateMetadata, validateMetadata, etc. ?
  • In the NGR project, a modification to the code around the editor called Inflation and Vacuum is implemented, that makes it much easier to create valid metadata from scratch. In essence it takes the function of update-fixed-info.xsl (which also tries to do some automatic adjustments to help things along) a whole seven miles further. What do the developers think of this? (I'll provide documentation sometime soon).

Topics extracted from Australia/New Zealand Community GeoNetwork Feedback:

  • GeoNetwork needs a range of metadata editors and the XForms Editor (from geonetworkui sandbox) should be available as part of this range. An XForms engine is an alternative technology that potentially hides details of HTML and JavaScript from developers. (The usefulness of the XForms editor will be determined to a large extent by how well it works across browsers and how responsive it is. What does the "potentially hides details" bit actually mean? That's just wishful thinking, and adding XForms means yet another complicated technology for developers to master. Justification/Action: Develop XForms interface as providing a user friendly interface with the flexibility to meet the needs of different users.
  • GeoNetwork needs a range of metadata editors and the ANZMet Lite (a wizard based editor available for download from here) should be part of the toolkit. ANZMet Lite needs to be open sourced under (GPL) to be distributed with GeoNetwork. Comments: If the web interface were improved, the need for ANZMet Lite would be reduced. There is a need for “offline” metadata creation when researchers or data collectors are not connected to the Internet – this is where ANZMet Lite has unique value. Why not improve the existing GeoNetwork editor (see geocat.ch editor, merge some of the features into the trunk)? Justification/Action: Add ANZMet Lite as a user friendly, Wizard based PC editing interface with the flexibility to meet the needs of different users. Simon Pigot has already added GeoNetwork upload/download to ANZMet Lite.
  • GeoNetwork services and JavaScript API need to be documented so that the user interface can be replaced and/or the existing functionality reused or customized. A different user interface skin should be easy to apply. The new Jeeves test framework offers an opportunity to document the inputs and outputs of the services. Action/Justification: The existing JavaScript API (web/geonetwork/scripts/core) needs to be documented and extended – existing code that doesn’t use the API needs to be refactored.
  • The technologies that are used in the user interface are not homogenous: XSLT, HTML and JavaScript are often mixed and hard to separate - this makes development and modification of the user interface difficult - but given the current architecture of GeoNetwork, a complete separation into components based on implementation language is impossible. Action: Separate the HTML, XML and JavaScript from each other so that a skilled interface designer does not need to know all three technologies to change the interface.
  • Reusing fragments of metadata (XML) – “object reuse”. Fragments are implemented in various sandboxes. Metadata records can be composed from fragments using XLinks and there is an XLinks URL Resolver. Community action needs to be consolidated through the fragments proposal. Many organisations would like GeoNetwork to be able to harvest fragments from relational databases as they often generate full metadata records from relational databases using custom software. If the database information changes, these records then need to be re-harvested. Some organisations would also like to be able to edit the fragments in GeoNetwork and return them to the database from which they were harvested. Action/Justification: Integrating fragments of metadata that are managed in an external system (i.e. relational database, authentication directory). There is a mechanism for implementation for metadata fragment harvesting from relational databases via a WFS in the BlueNetMEST sandbox. This work needs to be consolidated with work in the geocat.ch and geosource sandboxes and added to the trunk. This work should also be extended to allow metadata fragments in the relational database to be updated after editing in GeoNetwork.

Harvesting of fragments from authentication directories (eg. LDAP) should be added.

  • GeoNetwork needs some form of version control to track changes made to a metadata record over time. Action/Justification: This can be done inside the database without needing to externalise the metadata records. That way you can index and search on the old versions as well, if desired. Alternatively it could be done externally using perhaps a Java interface to subversion or through an interface to existing enterprise document management systems or perhaps using a different database approach for the documents eg. CouchDB.
  • Some aspects of project planning for GeoNetwork are not visible to those outside the project steering committee. Action: Continue to adopt and implement OSGeo best practise (e.g. GeoServer).
  • Documentation for ‘Implementing GeoNetwork into your organisation’ should be provided. Rather than changing the perspective of the current documentation from "how to" from "it does", perhaps you can have different documentation for different audiences. The “how to” section of the Trac is very useful. Action: As the “how to” section of the OSGeo GeoNetwork trac site expands, it could be linked into the documentation.
  • GeoNetwork’s current Lucene field / index names and the mapping of metadata fields to these Lucene field names are ad hoc. This has the potential to prevent search interoperability between catalogues.

GA: It is not enough to just provide Lucene field names and the mapping from metadata elements to these Lucene fields for all metadata schemas. GN also needs to: "These Lucene fields must be indexed as default so that any search of these fields will return the appropriate response. This will automatically provide interoperability for the distributed searches. Which fields are searched can be determined by the interface that is used to allow the user to enter search terms. If a particular field is not to be allowed as searchable by the interface then that field is not provided in the GUI. However, the Lucene indexes must still include that field so that other interfaces can search that field. GeoNetwork should use the geo profile of Z3950 (including attributes, data and relations) to define Lucene field names and the mapping from metadata elements to Lucene fields for all metadata schemas.

  • XSD and Schematron Validators return errors that are meaningless to most users. Ability to customise the error messages easily would be useful. Action: Code containing XSD validation messages needs to be modified to include alternative or additional messages to those already in use. Schematron diagnostics specified in rules should be made more useful to users.
  • GeoNetwork requires a generic capability for element help, code list choices and suggestions to be linked to metadata guidelines provided with profiles/standards. Action: GeoNetwork to call documentation components from external sources (e.g. mouse over tool tips from profile/standard and code list documentation).
  • GeoNetwork categories are not related to metadata content – should be configurable from content. Action: GeoNetwork should be able to configure dynamic categories from a Lucene field. Eg. An administrator could create category names as unique values of the Lucene field name purpose (which might be mapped to gmd:purpose for ISO) – records would belong to the category described by purpose cf. also discussion on dynamic categories ie. categories that are placeholders for a saved search.
  • GeoNetwork currently does not manage its own tag cloud / Folksonomy. Action: GeoNetwork could optionally manage these things internally rather than using a third party social networking site.
  • Network-crawling for geo-resources. Action: GeoNetwork needs to continue to be aware of and exploit initiatives for automatic harvesting of metadata from geo-resources. Eg. Metadata extraction tools such as the Talend Spatial Data Integrator suite etc
  • GeoNetwork lacks the ability to consistently reproduce a unique identifier for the same geo resource (e.g. same dataset stored in two different locations) and/or use persistent identifier services. This is somewhere along the range from "easy enough" to "very difficult", need to spell out the precise details of the set of features you have in mind. Action: GeoNetwork needs to be able to generate, store and use metadata identifiers (eg, gmd:fileIdentifier) as well as data identifiers using the current stand alone UUID, but also (for data objects) MD5 (including what the checksum was generated from) and identifiers from external persistent identifier services (It should be possible to obtain persistent identifiers for both metadata and data from external persistent identifier services).
  • Better inter-application interoperability. GeoNetwork needs to rethink the interoperability with the emerging FOSS such as the way that OpenLayers is designing / redeveloping its interface.

e.g. use of GeoExt; e.g. GeoNetwork needs to provide simple mechanisms to allow discovered resources to be exploited and utilised in complementary open source software; e.g. drag-and-drop discovered resources into OpenLayers or GeoServer. Action: Better intra-application interoperability. GeoNetwork needs to coordinate the discovery of resources with the publication of those same resources in FOSS such as GeoServer.

  • GeoNetwork assumes resources that are tagged as data for download in gmd:protocol are local. Action: GeoNetwork needs to allow for the fact that data tagged as data for download may not be local.
  • Remote search across a number of sites returns a pre-selected number of hits from all remote sites (pre-selected number is a search option) – it should return these hits from each site. Action: Presentation of pre-selected number of hits from each remote site – may require more delving into JZKit.
  • Presentation of returned hits from remote sites may be very slow because search is limited by the speed of the slowest site. Action: Presentations of first returned hits from first responding remote site should not have to wait on the slowest site – may require more delving into JZKit.
  • There are too many configuration files in too many places eg. repositories.xml.tem and not all configuration options are supported by the existing admin interfaces.

CSIRO: Supported by RW

Continue to consolidate configuration options in the system configuration interface.

There is no documentation for the implementation of alternative web map clients to Intermap and this makes it appear that the process is far harder than it actually is. AIMS: Would like to see this as an above average priority SoftImp: Agreed. However, given the enthusiasm for an OpenLayers-based interface, what "interface" there is will probably soon be rapidly-evolving - if not replaced completely. Document the interface that GeoNetwork uses to call a web map client so that sites can substitute their own. Current capability of GeoNetwork to use distributed searching is given a low priority and not being developed when compared with the local search.

More consideration is required towards distributed searches and proper attention should be given to it. Distributed CSW searches are not available. All OGC CSW standards and specifications should be implemented.

Potential for remotely accessed information to be malicious. GeoNetwork should validate all XML inputs and responses (eg. as it does for CSW) and check expected MIME types e.g. you ask for a GIF, you get a GIF. And indicate / reject non-conforming content with a warning?

Indexing slows as large numbers of records are ingested. AIMS: Agree. What is a “large” number of records out of interest? Lucene needs to be optimised for initial ingestion / indexing of multiple records – this is a potential scalability issue.

GeoNetwork does too much expensive processing of XML documents with XSLT. Continue to seek out and remove unnecessary XSLT processing.

The way that GeoNetwork handles timeouts to remote requests is not configurable. In GeoNetwork, timeout on remote requests e.g. WMS, should be configurable via the administration interface.

Developments in "sand boxes" are not pushed back into the trunk in a timely manner.

CSIRO: Supported by RW SoftImp: Agreed. The PSC should publish and enforce tighter processes relating to sandboxes. If possible, all sand box developments should be pushed back into the trunk in a predetermined time period (this should be a condition of being granted permission to set up a sandbox). If the sand box feature can't be pushed into the trunk because the trunk code doesn't have the capability (e.g. Pluggable profiles, pluggable skins) then priority should be given to developing that capability in the trunk so that the sand box feature can be included into the trunk (relates to project management comment/observation above). GeoNetwork is not distributed with multiple skins and it does not allow pluggable skins.

AIMS: Skinning will help with roll out to agencies with less technical skills. Branding is important.

GeoNetwork should be released with multiple skins that can be optionally selected and are pluggable. These skins should be easily modified for an organisation’s needs and not be contained within the XSL or Java code.

The installer does not provide the option to choose Tomcat and an alternative to Jetty.

AIMS: Agree SoftImp: The current situation reflects GeoNetwork’s origins, particularly its funding bodies. Adding Tomcat and supporting it would require fixing some current defects - a good thing. But it would be a lot of work to maintain it, in particular, it would significantly increase the time required for testing and release preparation.

GeoNetwork should use the existing BlueNet MEST Tomcat configuration to provide an option within the installer to choose Tomcat instead of Jetty as the servlet container. Jetty should continue to be the default.

Additional Comments Parent/child/sibling bidirectional navigation for metadata records Finding the parent or child of given record is painful Use of parent/child/sibling metadata records in the search results as a way to cope with varying levels of record granularity. For example, listing all children under the parent and presenting this within a collapsed tree GUI component. Perhaps provide a way to limit results to only parents and toggle this option on/off. Community is seeking a way to deal with varying granularity of metadata records, such that fine scale records don’t swamp fewer broad scale records. Many fine scale records (highly granular) make the metadata system more powerful (useful). Being forced to limit granularity only as a work around for basic search result presentation/visualisation would be a shame. Support for external vocabulary services Vocab services are becoming more common and an ability to connect to a custodians vocab service would be beneficial and reduce duplication and creation of stale vocabs/ thesauruses in GN. An interface is required to query for vocab definitions from external sources Reusable (Controlled) Objects, allow fields to be reusable Currently, if a user were to enter multiple records, for each record that user would have to re-enter “owner” their details. Worse, if that person’s details were to change, they remain the same in old records for which they have edited. The person’s details should be held as a managed object for which all records reference. This would allow the updating of details be reflected in each record that uses them HTTPS support Currently all logins to GeoNetwork are going unsecured through HTTP and the GN configuration doesn’t allow the use of HTTPS enabling account sniffing attacks. EPSG code data from external service At the moment EPSG codes have to be entered manually, but external online services are available with that data. GN should utilize this.

Hierarchical keywords Keywords from external vocabularies (see item 35) should utilize hierarchical broader/narrower structures to ease searching capability.

presentation proposals

  • Australia/NZ Community GeoNetwork Feedback: Simon Pigot - see attached document (basic points extracted to discussion topics)

Attachments (1)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.