wiki:FdoEnhancedSchemaNameSupport

Version 11 (modified by gregboone, 17 years ago) ( diff )

--

FDO Enhanced Schema Name Support

Overview

This document discusses various options for improving the translation of Feature Schema names between FDO and GML.

The FDO API provides functions for translating FDO Feature Schemas to and from the OpenGeospatial GML format. These functions are provided to satisfy 4 main use cases:

  • Export/Import: to provide a text based export format for FDO Feature Schemas. This export format actually covers other types of objects such as Spatial Contexts, Schema Overrides and Features. However, only Feature Schemas are pertinent to this document.
  • Schema Exchange: Allow exchange of schemas between FDO and external GML-based applications.
  • WFS Provider: used by the FDO WFS Provider to translated GML schemas, provided by the connected WFS, to FDO Schemas.
  • Publish as WFS: Allow FDO accessible data to be published via a WFS. This is the opposite of the previous use case.

One of the big challenges, in translating schemas between FDO and GML, is the converting of schema names. In order to support the above use cases, these conversions must satisfy the following general requirements:

  • round trip fidelity. If the schema is translated from GML to FDO to GML, the schema name in the resulting GML schema document must be the same as in the original. Similarly, the name must not change when translated from FDO to GML to FDO.
  • name uniqueness must be preserved. Different GML schemas must get different FDO schema names when read into FDO. Conversely different FDO schemas must get different GML schema names when written to GML. If name uniqueness is not preserved, schemas will be unexpectedly merged on read or write.

The structure of schema names differs greatly in either format:

  • in FDO, a schema name is a free-form name, containing any character except '.' and ':'. Names tend to be short; more detailed information is typically kept in the schema description.
  • in GML, the schema name must be a valid URI. Most current FDO schema names are valid URI's. However, most GML schema names tend to conform to the http scheme (see glossary), as seen in the following example. FDO Schema names would tend to not fit the http scheme.

A typical example might be a Roads schema defined by the municipality "MyCity". The FDO schema might simply be "Roads". However, the GML schema name might look something like this:

http://www.mycity.on.ca/departments/transportation/Roads

where the schema name is qualified by the owning organization. This makes it difficult to perform the schema name conversion in a way that satisfies the abovementioned requirements.

The FDO API provides a number of methods to ensure round trip fidelity and preservation of schema name uniqueness. However, these methods are cumbersome for some of the abovementioned use cases. This document looks at alternatives for making schema name translation easier when performed through the FDO API.

Current API

Feature Schema translation is provided by 2 functions on FdoFeatureSchemaCollection:

  • ReadXml() converts GML schemas to FDO
  • WriteXml() converts FDO schemas to GML (WriteXml() is also present on FdoFeatureSchema to allow the writing of individual schemas).

Both of the above functions take optional FdoXmlFlags parameters, which control how the translation is performed.

The following sub-sections look at the various schema name translation options currently provided:

Default Translation

FDO to GML

When no FdoXmlFlags are specified, the FDO schema name is translated by prepending a default osgeo-defined schema prefix (http://fdo.osgeo.org/schemas/feature/) to the schema name and escaping any characters not allowed in a URI. For example, the FDO Schema "Water Service" becomes:

http://fdo.osgeo.org.schemas/feature/Water-x20-Service

GML to FDO

When no FdoXmlFlags are specified, GML schema names are translated by dropping any http:// prefix and escaping '.' and ':' to '-dot-' and '-colon-' respectively. This means that the example schema name from the Overview:

http://www.mycity.on.ca/departments/transportation/Roads

becomes:

www-dot-mycity-dot-on-dot-ca/departments/transportation/Roads

This preserves name uniqueness but leads to a rather messy looking FDO schema name that is not easily human-readable.

When the GML schema name begins with the default schema prefix (http://fdo.osgeo.org/schemas/feature/), this whole prefix is removed from the schema name. For example:

http://fdo.osgeo.org/schemas/Roads

simply becomes:

Roads

This is done to preserve round-trip fidelity.

Use Case Implications
Export/Import

The Default method works well for the Export/Import use case. The schema name is preserved on round trip from FDO to GML to FDO. The http://fdo.osgeo.org/schemas/feature/ prefix is added when the feature schemas are written to GML and removed when they are read back from GML. The fact that the GML schemas all look like they're owned by OSGeo is not an issue. Actors, for this use case, aren't concerned about the GML format itself; they just want to be able to export FDO schemas and re-import them later.

Schema Exchange

The Default method does not work well for the Schema Exchange use case, since it generates rather messy FDO schema names from the GML names. Also, when FDO schemas are written to GML, the schema name always gets the default prefix prepended. Most customers would likely want the GML schema name to be a URI that reflects their own organization, rather than OSGeo.

There is also a defect which occurs when a schema name, that doesn't start with the default prefix, is round tripped from GML to FDO to GML. The GML Schema name:

http://www.mycity.on.ca/departments/transportation/Roads

becomes:

www-dot-mycity-dot-on-dot-ca/departments/transportation/Roads

in FDO. However, when written back to GML, the default prefix is still prepended to the schema name, giving:

http: fdo.osgeo.org/schemas/feature/www.mycity.on.ca/departments/transportation/Roads

Therefore, round trip fidelity is not preserved.

WFS Provider

The Default method is not applicable to the WFS Provider use case. The WFS Provider always passes FdoXmlFlags to the ReadXml function.

Publish as WFS

As with the Schema Exchange use case, the default method does not work well for the Publish as WFS use case since all schema names end up prefixed with http://fdo.osgeo.org/schemas/feature/.

Customized Schema Prefix

The FdoXmlFlags class provides a url attribute that allows the default osgeo schema prefix to be overridden. This url is then used to prepend schema names on writing to GML and for stripping off prefixes when reading from GML. For example, if the url is set to www.mycity.on.ca/departments/transportation, then the FDO schema named "Roads" becomes:

www.mycity.on.ca/departments/transportation/Roads

when the schema is written to GML. If this schema is read back from GML. the url prefix is removed and the FDO schema name becomes "Roads".

This method works better, for the Publish as WFS use case, than the default method. The customer can control what the GML schema name looks like and can make it reflect the URI's used by their organization. However, there is a limitation in that the same schema prefix gets applied to each schema in the schema collection. For example, if the schema contains a "Roads" and "WaterService" schema, and the desired GML Schema names are:

www.mycity.on.ca/departments/transportation/Roads
www.mycity.on.ca/departments/watersewer/WaterService

then each schema must be written by a separate WriteXml() invocation, with a different url flag.

For the same reason, this method works better than the default method for the Schema Exchange use case. However, there is one caveat: round trip fidelity is only preserved if the same url is specified on both ReadXml() and WriteXml(). Also, the GML schema name must start with the url prefix for round trip fidelity to be preserved. This might be difficult for an application to manage, especially if a different prefix is required for each schema.

This method would not likely be used in the Export/Import use case, due to these extra complications in managing the prefixes for each schema.

This method is not applicable to the WFS Provider use case.

Short Prefix as Schema Name

Most GML schema documents associate a short prefix with each schema name. This short prefix is used to define references to the schema, in other parts of the document. This makes the GML document smaller and more readable. The short prefix is not guaranteed to be unique within a GML document. However, it is unique for the XML element where it is defined plus all sub-elements.

The FdoXmlFlags provides a SchemaNameAsPrefix setting. When set to true (the default is false), the short prefix becomes the FDO Schema name, when reading from GML to FDO. This setting is only applicable to the ReadXml() function. WriteXml() always uses the url setting to generate the GML schema names.

This method is used by the WFS Provider and works well for this use case. Although not globally unique, these prefixes are unique for a particular WFS. The prefixes are also very similar in format to a typical FDO Schema name in that they tend to be short and easily readable by humans.

This method is not applicable to the other use cases, since all of these require the use of the WriteXml() function. GML to FDO to GML round trip fidelity is not preserved. The full URI for the schema name is lost when translating to FDO, so it cannot be reliably re-constituted when translating back to GML.

Requirements and Gap Analysis

The current API handles the Export/Import and WFS Provider uses cases fairly well. However, the handling of the Schema Exchange and Publish as WFS use cases is cumbersome for the following reasons:

  • customer must usually supply a schema prefix and must ensure that consistent prefixes are used when reading and writing the feature schemas.
  • due to a bug, round trip fidelity of schema names is not preserved when the schema name does not start with the schema prefix.
  • For a single ReadXml() or WriteXml() operation, there is no way to apply a different schema prefix to each schema.
  • On ReadXml(), if the GML schema name is not prefixed by the schema prefix, a messy looking FDO schema name is generated.

The main requirement is to streamline the handling of these two use cases. MapGuide currently (or plans to ) supports publishing FDO data sources through a WFS, so supporting this Publish as WFS likely takes priority over Schema Exchange. (TBD: verify MapGuide's WFS publishing requirements).

Recommendation Summary

Section 'Solution Options' below explores a number of solution options. From these options, a number of recommendations can be made:

  • It is recommended that the Schema Name Attributes option be implemented (see 5.2 Schema Name Attributes). Although not perfect, it is the best option mentioned in this document.
  • it is not currently recommended that we allow '.' and ':' in schema names (see 5.1 Simple Default Translation). However, this option would make schema names more readable so it should be explored further to see if the potential drawbacks can be addressed.

Solution Options

The following lists some options for improving schema name conversion. These options are not necessarily mutually exclusive. However, it is unlikely that we'd implement them all since that would add a lot of complexity to the FDO API

Simple Default Translation

Under this option, the url setting for FdoXmlFlags would no longer have a default. When no FdoXmlFlags are specified, ReadXml() and WriteXml() would not usually modify the schema name, meaning that the GML and FDO names would usually be identical.

WriteXml() would only modify the name if it is not a valid URI. In this case, the name would be adjusted by escaping any non-conforming characters.

ReadXml() would only modify the name if it contains any escaped characters, which it would unescape. ReadXml() would no longer escape '.' and ':'. This has implications on the FdoFeatureSchema object in that these two characters would have to be allowed in schema names. This can be done by removing the no '.' or ':' restriction from schema element names. However, we'd need a way to indicate when these are literal characters when they appear as qualified names. One possibility is to mandate that each component of a qualified schema element name must be enclosed in double quotes when it contains a '.' or ':'.

When the url FdoXmlFlag is specified, the behaviour would be as now: WriteXml() would tack this url prefix onto the schema name and ReadXml() would remove it if present.

Pros

  • Simpler approach. For the default case, we don't need to worry about tacking on and stripping off prefixes.
  • FDO Schema Names derived from GML are more readable. The names can still be long but at least they won't contain the '-dot-' and '-colon-' sequences.
  • Works well for the Export/Import and Schema Exchange use cases as long as both ReadXml() and WriteXml() are done in FDO 3.3.0. Not applicable to the WFS Provider use case.

Cons

  • Backward Compatibility. For the Export/Import and Schema Exchange use cases, schema name round trip fidelity is not preserved if WriteXml() is done from pre-Slate and ReadXml() from FDO 3.3.0 or vice versa. The reason is that one operation will use a different name translation method from the other. This is more of a problem for Export/Import. The bug mentioned above already introduces round trip fidelity problems for the Schema Exchange use case. The backward compatibility issues could be mitigated if we could detect when one of the operations was done using pre-slate. In this case the operation done in slate would use the pre-slate name translation rules. However, this eliminates the simplicity pro since we still have to keep the old name translation algorithms around.
  • Provider compatibility. Some providers might need changes, since current code based on assumption that schema element names do not contain '.' or ':'.
  • Qualified Schema Element name compatibility. This option changes the rules for constructing qualified element names. In pre-slate, double quotes are always literals but in Slate they would be delimiters. It is unlikely that any pre-existing schema names start and end with a double quote but it is possible.
  • GML Schema names no longer guaranteed to follow the http scheme. Almost every GML schema we've seen sofar follows this URI naming scheme, so customers might complain if we don't always follow it. Therefore, this option does not work well for the Publish as WFS use case. Clients doing this sort of publishing will likely start specifying FdoXmlFlags to WriteXml to ensure that the GML Schema name follows the http scheme.

Conclusions:

  • If we were starting from scratch, this would be a good option. However, at this stage, it is not really viable, due to backward compatibility issues, since these issues wipe out the advantages in simplicity.

Schema Name Attributes

Under this option, attributes would be added to the GML and FDO schemas to specify the name of the schema in the other domain.

An xs:schema/fdo:name attribute would be added to the FDO XML format, to specify the equivalent FDO name for schema.

WriteXml() would write this attribute to the GML document.

when present, ReadXml() would take this attribute as the FDO schema name. Otherwise, the FDO Schema name would be generated from the GML name as is currently done.

A globalName attribute would be added to the FdoFeatureSchema class:

ReadXml() would set this attribute.

when present, WriteXml() would use this attribute as the GML schema name. Otherwise, the GML schema name would be generated from the FDO schema name, as is currently done. Hopefully, the GlobalName would be set to a valid URI. If not then WriteXml() would escape any non-conforming characters. Alternatively, we could restrict the GlobalName to be a valid URI.

From a semantic standpoint, the name attribute for FdoFeatureSchema would be unique within a particular domain (e.g.: an FDO Datastore, an FdoFeatureSchemaCollection in an application). The GlobalName attribute would be intended to be universally unique, or least unique among all organizations that use the feature schema.

Pros:

  • Good for Schema Exchange use case. Customer no longer needs to ensure the same url FdoXmlFlag is used for both WriteXml() and ReadXml(). For the Schema Exchange, case it also opens up the possibility of using the Short Prefix as Schema Name method, since it eliminates the problem of reconstituting the GML schema name on WriteXml().
  • Flexible. The GML and FDO schema name correspondences can be set on a per-schema basis.
  • Still a relatively simple option. Much simpler than the Schema Overrides and Schema Namespaces options discussed below.
  • Good for Publish as WFS use cases. Publisher can set the GML name for each FDO schema explicitly.
  • GML schema name can be persisted in an FDO Datastore.
  • Alternatively, GML schema name can be set on the Feature Schema just before it is converted to GML. This can easily be done via the FDO API.

Cons:

  • There is no guarantee that these new settings will be persisted. This limits the cases where these settings can be effective. If an FDO schema goes through the following steps:
  • Write to GML
  • Read GML into 3rd party application
  • Write from 3rd party application to GML
  • Read from GML

the fdo:schema name will likely be lost when the schema goes through the 3rd party application.

FDO providers would not necessarily be immediately modified to support the GML namespace FdoFeatureSchema attribute. Therefore, the following steps:

  • Read from GML
  • Apply schema to FDO datastore
  • Describe schema back from datastore
  • Write schema to GML

will lose the GML Schema name if the datastore's provider does not handle it. However, this particular problem can be mitigated by ensuring that the SDF and RDBMS providers support this attribute.

This con would be applicable to the Schema Exchange and Publish as WFS use cases. Export/Import and WFS Provider would be unaffected.

  • For cases where the FDO schema name cannot be persisted in the GML document, it would be possible for the application to add it to the document just before it is read into FDO. This is possible to do but not quite straightforward since there is no simple FDO API to do this. It would have to be done via the Xerces DOM classes or by an XSL transformation.
  • There is overlap between the globalName attribute and the targetNamespace attribute on FdoSchemaMapping (See next section), since both would represent the GML schema name. We could have a precedence rule (e.g. FdoSchemaMapping.targetNamespace trumps FdoFeatureSchema.globalName). However, having multiple places where the same attribute can be set makes the FDO API more complicated.

Conclusions

  • This is a viable option. Although it doesn't help with Schema name translation in all cases, it would still handle a lot of cases
  • Despite the overlap with GML schema overrides, this would currently be the recommended option, since it is the best one mentioned in this document.

Schema Overrides

The FdoXmlSchemaMapping class already provides a way to match FDO and GML names for each feature schema. This is provided by this class' name (FDO name) and targetNamespace (GML name) settings. This class can already be passed to ReadXml() and WriteXml() via the FdoXmlFlags. We'd just need to start using its mappings in ReadXml() and WriteXml().

Pros:

  • Flexible. The GML and FDO schema name correspondences can be set on a per-schema basis.
  • More consistent with rest of FDO since GML names specified as GML-specific overrides.
  • Opens up the ability to specify overrides for other FDO schema elements and customize how they are translated to GML. This would allow us to round trip other GML and XML Schema constructs, that don't easily fit into FDO, such as xs:choice elements.

Cons:

  • More complicated API. Client must create set of override classes, instead of just setting the schema name attributes or FdoXmlFlags.
  • Problematic for Schema Exchange use case. Instead of ensuring consistent url prefixes are used by both ReadXml() and !WriteXML(), the customer must ensure that consistent GML Schema Overrides are used. Instead of having to manage names, they have to manage object hierarchies or XML fragments.
  • Problematic for Publish as WFS use case. The GML Schema Overrides must be persisted somehow so that consistent GML schema names are used, each time someone retrieves schemas from the WFS. Currently, they must be persisted in some file storage location.

One way to mitigate this con is the allow them to be stored in the provider datastore if the provider supports FdoIApplySchema. The RDBMS provider implementations of this command can take a set of schema overrides. However, they discard all overrides except those for that particular provider. A possible enhancement would be for the RDBMS providers to be able to persist schema overrides from other providers. This could be done by adding a CLOB column to f_schemainfo, where these overrides would be stored in XML format. There is a problem with doing this in that Schema Overrides cannot be serialized to XML without their corresponding provider being present, which is not guaranteed. Perhaps this enhancement should be limited to just handling GML Schema Overrides. These overrides can be serialized using only core FDO.

Conclusions:

  • This is the most elegant solution but it lacks the simplicity of the Schema Name Attributes option:

o it requires the use of Schema Overrides, which are not popular with application developers.
o the persistence problems are worse. Instead of needing to persist a name, a whole set of classes or an XML fragment must be persisted.

Schema Namespace

Under this option, the concept of a Schema Namespace would be added to FDO, where Feature Schemas would be grouped into namespaces. The intent of the namespace name would be to contain the name of the organization that owns the schema. This could be a complex path as per the http URI scheme. The Feature Schema name itself would tend to be a simple short name as it is now.

On WriteXml:

The GML schema name would be fdo_ namespace + '/' + fdo_schema_name

On ReadXml:

the GML schema name would be split at last '/'. The left part would become the FDO namespace, and the right part the schema name. If the GML schema name does not contain'/' then it would become the FDO schema name and the namespace would be the default namespace (see below).

There are a couple of sub-options for representing the namespace in FDO. These are discussed in the following subsections, along with Pros, Cons and Conclusions.

Full Object

Under this sub-option, the namespace would be a fully fledged Fdo Schema Element. Feature Schemas would be grouped into namespaces. This could be done by adding the following classes:

class FdoSchemaNamespace: public FdoSchemaElement
{
        FdoSchemaCollection* GetSchemas();
}

class FdoSchemaNamespaceCollection : public FdoSchemaCollection<FdoSchemaNamespace>
{
}

Currently, FdoFeatureSchemaCollection is the highest level schema object. Under this option FdoSchemaNamespaceCollection would become the highest level object.

The SDF and RDBMS providers would be changed to handle namespaces. For the RDBMS providers, this would likely mean adding a new MetaSchema table. For backward compatibility, we'd need to introduce the concept of a default namespace. Feature Schemas in pre-existing datastores would automatically go under the default namespace.

For backward compatibility reasons, it would not likely be possible for FdoIDescribeSchema and FdoIApplySchema to handle namespaces. We'd need new commands:

class FdoIApplyNamespace
{
        void SetNamespace( FdoSchemaNamespace* namespace );
        FdoSchemaNamespace* GetNamespace();
        void Execute();
}

Class FdoIDescribeNamespace
{
	void SetNamespace( FdoString* namespace );
	FdoString* GetNamespace();

	FdoSchemaNamespaceCollection* Execute();
}

TBD: Ways to apply and retrieve schema overrides for a namespace or collection of namespaces would also be need to be added.

FdoIApplySchema, FdoIDescribeSchema and FdoIDescribeSchemaMappings are all geared to the idea that the FdoFeatureSchemaCollection is the highest level object. Therefore, the addition of namespaces would affect these commands. For backward compatibility, the best way to handle these commands would be to restrict them to handling feature schemas in the default namespace. The new commands must be used to access and manipulate schemas in other namespaces. It would be possible to allow FdoIApplySchema to handle schemas from non-default namespaces, since the namespace would be the feature schema's parent. However, it would be inconsistent to allow FdoIApplySchema to handle these schemas if they cannot be retrieved through FdoIDescribeSchema.

Feature Commands would also be impacted since these can take qualified class and property names. These qualified names must be able to contain the namespace name if they are to be unique within a particular FDO datastore. This would mean changes to providers that add support for namespaces.

Although a new command adds complexity, this could be an opportunity to add incremental schema describing. Currently, when a schema is described, all of its classes and properties are retrieved. There is no command to just list the schemas or just the schemas and classes. This affects performance, since callers to FdoIDescribeSchema always get the entire schema when they might be just interested in part of it.

The ability to just list certain objects could be provided by an elementLevel attribute on FdoIDescribeNamespace:

SetElementLevel ( elementType level ) elementType GetElementLevel()

where elementType is one of (namespace, schema,. class, property, constraint, all). When set, Execute() would only return elements down to that level. To list only the namespaces, set elementType to namespace; to list namespaces, schemas, and classes set elementType to class. Setting elementType to property would cause everything except for unique and property value constraints to be retrieved (this would have performance benefits for RDBMS providers). Setting elementType to all would retrieve everything.

Pros:

  • More generalized than the Schema Name Attribute approach. Instead of putting a schema override on FdoFeatureSchema, we add a new provider-neutral concept to the Fdo Feature schema model.
  • Better than the Schema Override approach since a separate Schema Overrides set is not required to customize schema name translation between FDO and GML.
  • Flexible. The GML and FDO schema name correspondences can be set on a per-schema basis.
  • Provides an opportunity to support incremental schema describing

Cons:

  • Existing providers might not immediately add support for namespaces. When these providers are used, it is not possible to persist the GML to FDO schema name correspondences.
  • Schema names themselves no longer unique, unless also qualified by namespace. Some applications might rely on schema names being unique
  • Introduces a whole new level to the Fdo Feature Schema model, adding to its complexity. This isalso a departure from the OpenGeospatial model, where a schema is a namespace, rather than being contained in a namespace.
  • This option may seem to get rid of the problem where schemas read from GML have long names. However, it merely splits these long names and puts the more complex part in the namespace name. Clients still new need both namespace and schema name to uniquely identify a schema.

Conclusion:

  • This option is viable but not recommended. The main reasons are:

o it departs from the OpenGeospatial schema model
o it adds complication to the FDO Feature Schema model
o the implementation effort is a bit high.

Just a Name

Under this option, namespace would not be a fully fledged Fdo Schema Element but merely a new attribute on FdoFeatureSchema:

class FdoFeatureSchema
{
	void SetNamespace( FdoString* namespace );
	FdoString* GetNamespace();
}

The Schema name would no longer be unique within FDO Datastore, the combination of namespace and schema name would be unique (Otherwise, this option would be the same as the Schema Name Attributes option).

FdoFeatureSchemaCollection would be impacted in that it cannot hold schemas of different namespaces since schema names must be unique within each collection. Because of this, FdoIDescribeSchema cannot retrieve all schemas for an FDO datastore if they are in different namespaces. This necessitates adding a new command:

Class FdoIDescribeNamespace
{
	void SetNamespace( FdoString* namespace );
	FdoString* GetNamespace();

	FdoFeatureSchemaCollection* Execute();
}

Note that it is a bit different than the FdoIDescibeNamespace from the previous section, in that it returns an FdoFeatureSchemaCollection. Therefore, this command will only be able to retrieve the schemas for a single namespace. This also means we need an extra command to list all the namespaces on a datastore.

We'd still need to add namespace to qualified schema element names and modify the feature commands to handle the namespace component.

Compared to other options, this option has the same pros and cons as the Schema Namespace - Full Object sub-option. When compared with Schema Namespace – Full Object, it has the following pros and cons:

Pros:

  • introduces a bit less complexity to the FDO Schema model in that there is no new FdoSchemaNamespace class.

Cons:

  • less flexible since FdoIDescribeNamespace cannot retrieve schemas for all namespaces.
  • an extra command is required to list namespaces, since there is no FdoSchemaNamespaceCollection class.

Conclusion:

  • Same Conclusions as for Schema Namespace - Full Object. However, eliminating the FdoSchemaNamespace and FdoSchemaNamespaceCollection objects doesn't buy us much so this sub-option is not recommended.

Open Issues

  • Does MapGuide currently support exposing an FDO datastore as a WFS? If so, have they encountered problems generating the targetNamespace for each schema? If not currently supported, do they plan to support Publish as WFS in the future.
Note: See TracWiki for help on using the wiki.