Changes between Initial Version and Version 1 of MapGuideRfc123


Ignore:
Timestamp:
Oct 24, 2011, 5:42:12 AM (13 years ago)
Author:
jng
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MapGuideRfc123

    v1 v1  
     1
     2= !MapGuide RFC 123 - Feature Join optimization shortcut using FDO Join APIs =
     3
     4This page contains a change request (RFC) for the !MapGuide Open Source project.
     5More !MapGuide RFCs can be found on the [wiki:MapGuideRfcs RFCs] page.
     6
     7
     8== Status ==
     9
     10||RFC Template Version||(1.0)||
     11||Submission Date||24 October 2011||
     12||Last Modified||24 October 2011||
     13||Author||Jackie Ng||
     14||RFC Status||draft||
     15||Implementation Status||pending||
     16||Proposed Milestone||2.4||
     17||Assigned PSC guide(s)||(when determined)||
     18||'''Voting History'''||(vote date)||
     19||+1||||
     20||+0||||
     21||-0||||
     22||-1||||
     23||no vote|| ||
     24
     25== Overview ==
     26
     27This RFC proposes to take advantage of recently introduced FDO join APIs to provide an optimization path for Feature Joins in MapGuide under certain Feature Source configurations.
     28
     29== Motivation ==
     30
     31Feature Joins has been a traditionally underperforming and buggy aspect of MapGuide Open Source / MGE / AIMS. A look at the number of tickets in trac related to Feature Joins can attest to this fact.
     32
     33Performance workarounds exist for SQL-based feature sources involving datastore-level joins encapulsated as a view (and possibly some metadata hacks in the datastore to be recognised as a feature class). This approach, while addressing the performance problem, presents its own set of issues, namely constraints imposed by the data store as a result of using a view.
     34
     35As of FDO 3.6, new APIs were introduced allowing support for native joins at the data store level, which is much more efficent and better performing than the costly in-memory joins performed by the GWS Query Engine component. With some minor modifications in the feature query logic, MapGuide can take advantage of these new FDO Join APIs if the extended feature class being queried satisfies some conditions (outlined in the Proposed Solution).
     36
     37By using FDO join APIs, we can tackle the performance and bugginess of Feature Joins simultaneously because both aspects are now delegated to the underlying FDO data store.
     38
     39It is hoped with the implementation of this RFC that we end up with a more positive user story with regards to Feature Joins.
     40
     41== Proposed Solution ==
     42
     43The solution consists of 3 parts:
     44
     45 * Testing for the FDO join optimization path when performing a SelectFeatures() call against an extended feature class
     46 * Setting up the FDO join query.
     47 * Handling the iteration logic for the results of a FDO join query.
     48 
     49Each part is explained in detail below.
     50
     51=== Testing for the optimization ===
     52
     53When performing a feature query against an extended feature class that consists of joins, use the FDO join APIs if the extended feature class satisfies the following criteria:
     54
     55 * The Feature Source this extended feature class belongs to supports joins (ie. The SupportJoins capability returns true)
     56 * The Feature Source this extended feature class belongs to joins with another feature class from '''the same feature source''' (this is because FDO Joins work within the context of the same connection). The feature class being joined on cannot itself be an extended feature class.
     57 * The extended feature class only contains one join. Supporting chained or multiple joins is beyond the scope of this RFC.
     58 * The type of join being performed is supported by the underlying FDO provider.
     59
     60If any of the above criteria is not met, the extended feature class in question is considered not to meet the FDO join requirements and is delegated off to the GWS Query Engine as it currently is.
     61
     62As of writing this RFC, the following FDO providers support datastore-level joins:
     63
     64 * SQLite (3.7/Trunk)
     65 * SQL Server Spatial (3.7/Trunk)
     66
     67Feature Sources using the above providers that contain extended feature classes configured in the above fashion will stand to take advantage of this optimization path. Any other FDO providers that do implement the required FDO Join APIs in the future, feature sources using these providers will be automatically eligible for this optimization path if they meet the same criteria outlined above.
     68
     69=== Setting up the FDO join query ===
     70
     71We use the FdoIExtendedSelect interface to perform the FDO join query. The extended feature class already uses an optional prefix on the secondary class as a means of disambiguating identically named properties on both sides of the join.
     72
     73Properties from the primary class will be specified as the following FDO computed property:
     74
     75{{{
     76primary_[PrimaryClassName].[PropertyName] AS [PropertyName]
     77}}}
     78
     79primary_PrimaryClassName will be specified as the alias for the FdoIExtendedSelect::SetAlias() method
     80
     81Properties from the secondary class will be specified as the following FDO computed property:
     82
     83{{{
     84secondary_[SecondaryClassName].[PropertyName] AS [Prefix][PropertyName]
     85}}}
     86
     87secondary_SecondaryClassName will be used as the alias for the join criteria that is added to the FdoIExtendedSelect's join criteria collection.
     88
     89Because the Extended Class Definition in a feature source does not explicity specify the list of properties from the secondary class to include, we include '''all properties''' from any secondary class that we are joining on by default.
     90
     91Through this setup, the returned feature reader will present the same property list as a reader returned by the GWS Query Engine.
     92
     93=== The FDO join feature reader ===
     94
     95The existing MgServerFeatureReader will be modified to incorporate iteration logic to handle forcing 1:1 in the resulting FDO feature reader.
     96
     97When 1:1 cardinality is not being forced, this reader behaves like a normal feature reader.
     98
     99In the case of the forcing of 1:1 cardinality, we take a different path and employ the following logic for ReadNext():
     100
     101 1. Read the current identity property values. Hash these values into a string and check if this hashed string exists in an internal std::set.
     102 2. If this value exists, keep reading until we either get a hashed string that does not yet exist in the internal set, or until we reached the end of the feature reader, in which case we return false
     103 3. Store this hashed string into the internal set for future comparisons.
     104
     105Iteration logic for the normal case (not 1:1), is simply to pass over to the underlying reader's ReadNext() method. Tracking of identity property values is not required for this case.
     106
     107Though not reflective of what the final implementation may be performance-wise, support for FDO joins has already been implemented in mg-desktop and the benefits there are clear. The performance times for a 17000 feature by 16000 feature join (of [http://code.google.com/p/mg-desktop/source/browse/DesktopTestData/ParcelsJoinTest.sqlite this dataset]) are outlined below.
     108
     109|| Test Case              || SDF    || SQLite ||
     110|| Inner Join             || 744.1s || 5.4s   ||
     111|| Left Outer Join        || 479.4s || 4.2s   ||
     112|| Inner Join (1:1)       || 595.2s || 4.4s   ||
     113|| Left Outer Join (1:1)  || 320.9s || 4.6s   ||
     114
     115
     116The SQLite provider implements the FDO join APIs. The SDF provider does not and is delegated to the GwsQueryEngine
     117
     118We don't expect the same numbers in MapGuide due to server/web tier overhead, but this should give a clear indication of the orders of magnitude in performance gains if this RFC is implemented.
     119
     120=== Aggregates ===
     121
     122Support for aggregate operations is beyond the scope of this RFC.
     123
     124== Implications ==
     125
     126No public APIs are affected. This is a server-side modification to take advantage of new APIs introduced by an external component (FDO). At the WebTier level, processing FDO join results is still done through the existing MgProxyFeatureReader. No changes are required on the Web Tier.
     127
     128No schema modifications are required for the Feature Source Schema. The extended class definition already provides enough information to construct an equivalent FDO join query.
     129
     130Though not necessary, it would be nice for the join editors in Maestro and Infrastructure Studio to notify the user of such optimization availability if the edited feature source in question is configured correctly.
     131
     132== Test Plan ==
     133
     134Add unit tests against some sample SQLite feature sources configured to take advantage of these optimizations. Verify that the FDO join optimization path is taken when selecting from these feature sources.
     135
     136== Funding / Resources ==
     137
     138Community