wiki:Bolsena2010

Version 52 (modified by mcoudert, 15 years ago) ( diff )

--

Topics for GeoNetwork discussion in Bolsena 2010

Bolsena

author: Heikki Doeleman

For the 3rd time, the Bolsena OSGeo Hacking Event is going to be held. This page lists some ideas that can be discussed within the GeoNetwork faction.

Please add any idea you have here !

Also, it would be nice if we can have a number of presentations, like we did last year. Tell us about your projects, or about some interesting technology or tool, or whatever you want. Please volunteer your presentation proposals on this page too.


discussion topics

Id Priority Topics Comments People interested In
1 Javadoc let's clean up the existing Javadoc and add new where it is missing. It'd be good to familiarize yourself with how Javadoc works, before doing this; e.g. there should be no blank line between the Javadoc and the method it is about; the first sentence should end in a period; and things like that.
2 harvesters Let's remove the harvesters' configuration from the "settings" table to its own, first-class-citizen table. Now, if you have many harvesters, it is nigh impossible to find anything in "settings".
3 harvesters Related to before topic: rewrite harvesters client side code to remove unrequired ajax stuff. Just make "normal" forms for harvesters maintainment avoiding using ajax, except if really required for any functionality.
4
  • Javadoc: let's clean up the existing Javadoc and add new where it is missing. It'd be good to familiarize yourself with how Javadoc works, before doing this; e.g. there should be no blank line between the Javadoc and the method it is about; the first sentence should end in a period; and things like that.
  • Let's remove the harvesters' configuration from the "settings" table to its own, first-class-citizen table. Now, if you have many harvesters, it is nigh impossible to find anything in "settings".
  • Related to before topic: rewrite harvesters client side code to remove unrequired ajax stuff. Just make "normal" forms for harvesters maintainment avoiding using ajax, except if really required for any functionality.

  • While at it can we change the code so that you can save settings from the GUI even if not all expected settings are present in your database?
  • Automatically add the Javadoc pages to this wiki, updated from a Hudson build process? For all of the branches?
  • Some people really like working with patches; other people prefer using short-lived SVN branches for a similar purpose. Can we all agree on doing it one way or the other?
  • This wiki is a bit of a mess, in my opinion. I think it would be good if we could put maybe 3 people in charge to firstly, clean it up and better structure it; and secondly, to try to keep it that way.
  • Let's create SQL files that can fill an empty GeoNetwork database with only the minimum needed to run the program? The admin user, settings, regions, things like that. Not everyone is really eager to use GAST for anything.
  • Can we release GeoNetwork 3.0 (with the CSW/ebRIM interface)? Maybe we can have simultaneous "current releases" in both the GN2.x and GN3.x lineages, as do for example Lucene and Tomcat?
  • Does anyone like the function of the installer that it overwrites your JDBC credentials with randomly generated values? I certainly don't, as my DB lives very much longer than the many GeoNetwork installations I always do, so I have to edit config.xml everytime. How's about removing that?
  • Would it be an idea to appoint Language Managers for each of the supported translations? They would form the International Internationalization Committee (IIC, or CII in French) and they're summoned to maintain the i18n files for their language, before each new release. This might even be arranged in an OSGEO-wide manner.
  • The class DataManager.java and its sister XMLSerializer.java are in particularly bad shape, in my opinion. There are literally dozens of public methods that all do more or less the same thing. Of course it's not clearly documented why they are all there or when to use which. Would it be too drastic to propose that we keep 1 single public method for each of the functions createMetadata, updateMetadata, validateMetadata, etc. ?
  • In the NGR project, a modification to the code around the editor called Inflation and Vacuum is implemented, that makes it much easier to create valid metadata from scratch. In essence it takes the function of update-fixed-info.xsl (which also tries to do some automatic adjustments to help things along) a whole seven miles further. What do the developers think of this? (I'll provide documentation sometime soon).
  • Can we agree that we'll provide SQL scripts to create the database, and SQL scripts to fill it with sample data? And let's phase out those DDF files and the unfortunate GAST altogether? And that we provide update SQL scripts with new versions of GeoNetwork, both for changes to database schema and for content (like, Settings !) ?

Topics extracted from Australia/New Zealand Community GeoNetwork Feedback:

  • GeoNetwork needs a range of metadata editors and the XForms Editor (from geonetworkui sandbox) should be available as part of this range. An XForms engine is an alternative technology that potentially hides details of HTML and JavaScript from developers. (The usefulness of the XForms editor will be determined to a large extent by how well it works across browsers and how responsive it is. What does the "potentially hides details" bit actually mean? That's just wishful thinking, and adding XForms means yet another complicated technology for developers to master. Justification/Action: Develop XForms interface as providing a user friendly interface with the flexibility to meet the needs of different users.

How does it relate to Chiba?

  • GeoNetwork needs a range of metadata editors and the ANZMet Lite (a wizard based editor available for download from here) should be part of the toolkit. ANZMet Lite needs to be open sourced under (GPL) to be distributed with GeoNetwork. Comments: If the web interface were improved, the need for ANZMet Lite would be reduced. There is a need for “offline” metadata creation when researchers or data collectors are not connected to the Internet – this is where ANZMet Lite has unique value. Why not improve the existing GeoNetwork editor (see geocat.ch editor, merge some of the features into the trunk)? Justification/Action: Add ANZMet Lite as a user friendly, Wizard based PC editing interface with the flexibility to meet the needs of different users. Simon Pigot has already added GeoNetwork upload/download to ANZMet Lite.
  • GeoNetwork services and JavaScript API need to be documented so that the user interface can be replaced and/or the existing functionality reused or customized. A different user interface skin should be easy to apply. The new Jeeves test framework offers an opportunity to document the inputs and outputs of the services. Action/Justification: The existing JavaScript API (web/geonetwork/scripts/core) needs to be documented and extended – existing code that doesn’t use the API needs to be refactored.

Note: GeoNetwork xml services documentation exists in manual.

  • The technologies that are used in the user interface are not homogenous: XSLT, HTML and JavaScript are often mixed and hard to separate - this makes development and modification of the user interface difficult - but given the current architecture of GeoNetwork, a complete separation into components based on implementation language is impossible. Action: Separate the HTML, XML and JavaScript from each other so that a skilled interface designer does not need to know all three technologies to change the interface.
  • Reusing fragments of metadata (XML) – “object reuse”. Fragments are implemented in various sandboxes. Metadata records can be composed from fragments using XLinks and there is an XLinks URL Resolver. Community action needs to be consolidated through the fragments proposal. Many organisations would like GeoNetwork to be able to harvest fragments from relational databases as they often generate full metadata records from relational databases using custom software. If the database information changes, these records then need to be re-harvested. Some organisations would also like to be able to edit the fragments in GeoNetwork and return them to the database from which they were harvested. Action/Justification: Integrating fragments of metadata that are managed in an external system (i.e. relational database, authentication directory). There is a mechanism for implementation for metadata fragment harvesting from relational databases via a WFS in the BlueNetMEST sandbox. This work needs to be consolidated with work in the geocat.ch and geosource sandboxes and added to the trunk. This work should also be extended to allow metadata fragments in the relational database to be updated after editing in GeoNetwork. Harvesting of fragments from authentication directories (eg. LDAP) should be added.
  • GeoNetwork needs some form of version control to track changes made to a metadata record over time. Action/Justification: This can be done inside the database without needing to externalise the metadata records. That way you can index and search on the old versions as well, if desired. Alternatively it could be done externally using perhaps a Java interface to subversion or through an interface to existing enterprise document management systems or perhaps using a different database approach for the documents eg. CouchDB.

See also this approach to versioning WFS content?

  • Some aspects of project planning for GeoNetwork are not visible to those outside the project steering committee. Action: Continue to adopt and implement OSGeo best practise (e.g. GeoServer).
  • Documentation for ‘Implementing GeoNetwork into your organisation’ should be provided. Rather than changing the perspective of the current documentation from "how to" from "it does", perhaps you can have different documentation for different audiences. The “how to” section of the Trac is very useful. Action: As the “how to” section of the OSGeo GeoNetwork trac site expands, it could be linked into the documentation.
  • GeoNetwork’s current Lucene field / index names and the mapping of metadata fields to these Lucene field names are ad hoc. This has the potential to prevent search interoperability between catalogues. Action: GeoNetwork should use an established mapping such as the geo profile of Z3950 (including attributes, data and relations) to define Lucene field names and the mapping from metadata elements to Lucene fields for all metadata schemas.
  • XSD and Schematron Validators return errors that are meaningless to most users. Ability to customise the error messages easily would be useful. Action: Code containing XSD validation messages needs to be modified to include alternative or additional messages to those already in use. Schematron diagnostics specified in rules should be made more useful to users.

setErrorHandler already in use - could me modded to support more meaningful messages? Francois has updated schematron to schematron validation and reporting language.

  • GeoNetwork requires a generic capability for element help, code list choices and suggestions to be linked to metadata guidelines provided with profiles/standards. Action: GeoNetwork to call documentation components from external sources (e.g. mouse over tool tips from profile/standard and code list documentation).
  • GeoNetwork categories are not related to metadata content – should be configurable from content. Action: GeoNetwork should be able to configure dynamic categories from a Lucene field. Eg. An administrator could create category names as unique values of the Lucene field name purpose (which might be mapped to gmd:purpose for ISO) – records would belong to the category described by purpose cf. also discussion on dynamic categories ie. categories that are placeholders for a saved search.
  • GeoNetwork currently does not manage its own tag cloud / Folksonomy. Action: GeoNetwork could optionally manage these things internally rather than using a third party social networking site.

Ticket 96 suggests a way of doing this

  • Network-crawling for geo-resources. Action: GeoNetwork needs to continue to be aware of and exploit initiatives for automatic harvesting of metadata from geo-resources. Eg. Metadata extraction tools such as the Talend Spatial Data Integrator suite etc
  • GeoNetwork lacks the ability to consistently reproduce a unique identifier for the same geo resource (e.g. same dataset stored in two different locations) and/or use persistent identifier services. This is somewhere along the range from "easy enough" to "very difficult", need to spell out the precise details of the set of features you have in mind. Action: GeoNetwork needs to be able to generate, store and use metadata identifiers (eg, gmd:fileIdentifier) as well as data identifiers using the current stand alone UUID, but also (for data objects) MD5 (including what the checksum was generated from) and identifiers from external persistent identifier services (It should be possible to obtain persistent identifiers for both metadata and data from external persistent identifier services).
  • Better inter-application interoperability. GeoNetwork needs to rethink the interoperability with the emerging FOSS such as the way that OpenLayers is designing / redeveloping its interface. e.g. use of GeoExt; e.g. GeoNetwork needs to provide simple mechanisms to allow discovered resources to be exploited and utilised in complementary open source software; e.g. drag-and-drop discovered resources into OpenLayers or GeoServer. Action: Better intra-application interoperability. GeoNetwork needs to coordinate the discovery of resources with the publication of those same resources in FOSS such as GeoServer.
  • GeoNetwork assumes resources that are tagged as data for download in gmd:protocol are local. Action: GeoNetwork needs to allow for the fact that data tagged as data for download may not be local.
  • Remote search across a number of sites returns a pre-selected number of hits from all remote sites (pre-selected number is a search option) – it should return these hits from each site. Action: Presentation of pre-selected number of hits from each remote site – may require more delving into JZKit.
  • Presentation of returned hits from remote sites may be very slow because search is limited by the speed of the slowest site. Action: Presentations of first returned hits from first responding remote site should not have to wait on the slowest site – may require more delving into JZKit.
  • There are too many configuration files in too many places eg. repositories.xml.tem and not all configuration options are supported by the existing admin interfaces. Action: Continue to consolidate configuration options in the system configuration interface.
  • There is no documentation for the implementation of alternative web map clients to Intermap and this makes it appear that the process is far harder than it actually is. Given the enthusiasm for an OpenLayers-based interface, what "interface" there currently is will probably soon be rapidly-evolving - if not replaced completely. Action: Document the interface that GeoNetwork uses to call a web map client so that sites can substitute their own.
  • Current capability of GeoNetwork to use distributed searching is given a low priority and not being developed when compared with the local search. Action: More consideration is required towards distributed searches and proper attention should be given to it.
  • Distributed CSW searches are not available. Action: All OGC CSW standards and specifications should be implemented.
  • Potential for remotely accessed information to be malicious. Action: GeoNetwork should validate all XML inputs and responses (eg. as it does for CSW) and check expected MIME types e.g. you ask for a GIF, you get a GIF. And indicate / reject non-conforming content with a warning?
  • GeoNetwork does too much expensive processing of XML documents with XSLT. Action: Continue to seek out and remove unnecessary XSLT processing.
  • The way that GeoNetwork handles timeouts to remote requests is not configurable. Action: In GeoNetwork, timeout on remote requests e.g. WMS, should be configurable via the administration interface.
  • Developments in "sand boxes" are not pushed back into the trunk in a timely manner. Action: The PSC should publish and enforce tighter processes relating to sandboxes. If possible, all sand box developments should be pushed back into the trunk in a predetermined time period (this should be a condition of being granted permission to set up a sandbox). If the sand box feature can't be pushed into the trunk because the trunk code doesn't have the capability (e.g. Pluggable profiles, pluggable skins) then priority should be given to developing that capability in the trunk so that the sand box feature can be included into the trunk (relates to project management comment/observation above).
  • GeoNetwork is not distributed with multiple skins and it does not allow pluggable skins. Action: GeoNetwork should be released with multiple skins that can be optionally selected and are pluggable. These skins should be easily modified for an organisation’s needs and not be contained within the XSL or Java code.
  • There is no (installer) option to choose Tomcat as an alternative to Jetty. Comment: The current situation reflects GeoNetwork’s origins, particularly its funding bodies. Adding Tomcat and supporting it would require fixing some current defects - a good thing. But it would be a lot of work to maintain it, in particular, it would significantly increase the time required for testing and release preparation. Action: GeoNetwork should use the existing BlueNet MEST Tomcat configuration to provide an option within the installer to choose Tomcat instead of Jetty as the servlet container. Jetty should continue to be the default.
  • Parent/child/sibling bidirectional navigation for metadata records Finding the parent or child of given record is painful Action: Use of parent/child/sibling metadata records in the search results as a way to cope with varying levels of record granularity. For example, listing all children under the parent and presenting this within a collapsed tree GUI component. Perhaps provide a way to limit results to only parents and toggle this option on/off.
  • Community is seeking a way to deal with varying granularity of metadata records, such that fine scale records don’t swamp fewer broad scale records. Many fine scale records (highly granular) make the metadata system more powerful (useful). Being forced to limit granularity only as a work around for basic search result presentation/visualisation would be a shame.
  • Support for external vocabulary services Vocab services are becoming more common and an ability to connect to a custodians vocab service would be beneficial and reduce duplication and creation of stale vocabs/ thesauruses in GN. Action: An interface is required to query for vocab definitions from external sources.
  • Reusable (Controlled) Objects, allow fields to be reusable. Currently, if a user were to enter multiple records, for each record that user would have to re-enter “owner” their details. Worse, if that person’s details were to change, they remain the same in old records for which they have edited. The person’s details should be held as a managed object for which all records reference. This would allow the updating of details be reflected in each record that uses them - see the fragment harvesting of contact info above.
  • HTTPS support. Currently all logins to GeoNetwork are going unsecured through HTTP and the GN configuration doesn’t allow the use of HTTPS enabling account sniffing attacks.
  • EPSG code data from external service Action: At the moment EPSG codes have to be entered manually, but external online services are available with that data. GN should utilize this.
  • Hierarchical keywords. Keywords from external vocabularies should utilize hierarchical broader/narrower structures to ease searching capability.

  • Using Apache Tika to index content from files attached to metadata records in GeoNetwork?
  • Replace Lucene interface in GN with Apache Solr?
  • SchemaManager - redesign proposed by Mathieu:
    • use org.geonetwork.utils.xsd.XSD in the project "geonetwork-services-ebrim" to read schemas and query contents for driving the editor
    • GN uses a number of schemas for validation purposes eg. in OAI, could these be managed by schemamanager so that they do not need to be retrieved from net?
    • sometimes a document may introduce a new schema eg. ListSets response in OAI harvester can introduce the oai_dc schema eg. when harvesting jOAI - if we need to validate these responses then creating a validator with a file based schema will cause the validation to fail as the schema is not present on disk, alternative to is create a validator with no file based schema which means that all schemas will be obtained from the net but use an entityResolver object to intercept those which are local so as to avoid unnecessary retrieval or perhaps use a Java Cache System (JCS) instance to cache all schemas locally like XLinks?
  • Multi-editing - attempting to introduce the ability to edit more than one document makes existing trunk interface confusing eg. editing documents in tabs both of which have login details. Things get out of sync - what to do? Maybe something like BlueNetMEST which is based on one window - the main search window (tabs for remote, advanced and mapviewer) with search results - editing/viewing by clicking on title opens editor/viewer in new window (multiediting is supported), clicking any of the menu options on main screen uses modalbox dialog and separate windows to keep search interface and results window untouched, log out/log in closes all editor/viewer windows to close, if editing in progress log out not allowed - this is not perfect but might be a way of thinking about how to introduce things like multiediting to trunk.
  • Improve CSS management, clean CSS file and references to unused styles, replacing tables by divs, discuss on ThemeCustomization
  • Move to Maven as described here : Maven

More 'way out' stuff :-)?

  • Verbal annotations for YouTube videos - verbal annotations for metadata records?
  • CouchDB to hold metadata documents, couchdb-lucene to build lucene indexes, geocouch for spatial queries? Could couchdb be worth investigating further?
  • GeoNetwork should be to metadata and data management as iTunes is to managing music or TimeMachine is to backup? ie. we have a great engine what about building the 'killer' interface? who would do this?

presentation proposals

  • Australia/NZ Community GeoNetwork Feedback: Simon Pigot - see attached document (basic points extracted to discussion topics)

Attachments (1)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.