Changes between Initial Version and Version 1 of MapGuideRfc130


Ignore:
Timestamp:
Dec 17, 2012, 7:47:33 AM (11 years ago)
Author:
jng
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MapGuideRfc130

    v1 v1  
     1
     2= !MapGuide RFC 130 - Streamed HTTP feature/data/SQL query responses =
     3
     4This page contains a change request (RFC) for the !MapGuide Open Source project.
     5More !MapGuide RFCs can be found on the [wiki:MapGuideRfcs RFCs] page.
     6
     7== Status ==
     8
     9||RFC Template Version||(1.0)||
     10||Submission Date||18 Dec 2012||
     11||Last Modified||||
     12||Author||Jackie Ng||
     13||RFC Status||draft||
     14||Implementation Status||||
     15||Proposed Milestone||2.5||
     16||Assigned PSC guide(s)||(when determined)||
     17||'''Voting History'''||(vote date)||
     18||+1||||
     19||+0||||
     20||-0||||
     21||-1||||
     22||no vote|| ||
     23
     24== Overview ==
     25
     26This RFC proposes to add support for streaming feature/aggregate/SQL query results over the HTTP mapagent.
     27
     28== Motivation ==
     29
     30The current implementations of SELECTFEATURES/SELECTAGGREGATES/EXECUTESQLQUERY mapagent operations are woefully inefficient in terms of memory use. Each one of these operations performs internal XML buffering of the entire MgFeatureReader/MgDataReader/MgSqlDataReader causing potentially large memory spikes for large unfiltered Feature Source queries over the mapagent due to these operations maintaining the whole internal XML document before writing out the http response. If the desired format is JSON, the memory usage is even worse as not only do these operations maintain an internal XML string buffer, but also an internal DOM of this buffer for the purposes of converting to JSON.
     31
     32For example, a full SELECTFEATURES query of the Sheboygan Parcels (17k features) consumes the following amount of memory:
     33
     34 * XML: ~50MB
     35 * JSON: 1.5GB (!!!)
     36
     37And this is just for one request. If we want MapGuide to be a useful '''data''' platform, one that can support [http://osgeo-org.1560.n6.nabble.com/OpenLayers-SVG-MapGuide-Vector-Viewer-td4210812.html a theoretical map viewer that does full client-side rendering], our methods of querying data from it must be efficient. What we currently have in place is not acceptable.
     38
     39== Proposed Solution ==
     40
     41=== New internal MgReader APIs ===
     42
     43We introduce a series of new internal APIs to MgReader, which will allow us to return various parts of the default XML response instead of having to get the entire blob via ToXml():
     44
     45{{{
     46
     47class MG_PLATFORMBASE_API MgReader : public MgSerializable
     48{
     49
     50INTERNAL_API:
     51    ///////////////////////////////////////////////////////////////////////////
     52    /// \brief
     53    /// Returns the starting element name as a UTF-8 string. The mime
     54    /// type must be a text type, for example text/xml.
     55    ///
     56    /// \param str
     57    /// Destination string.
     58    ///
     59    virtual string GetResponseElementName() = 0;
     60
     61    ///////////////////////////////////////////////////////////////////////////
     62    /// \brief
     63    /// Returns the body starting element name as a UTF-8 string. The mime
     64    /// type must be a text type, for example text/xml.
     65    ///
     66    /// \param str
     67    /// Destination string.
     68        ///
     69    virtual string GetBodyElementName() = 0;
     70
     71    ///////////////////////////////////////////////////////////////////////////
     72    /// \brief
     73    /// Returns the start of the response as a UTF-8 string. The mime
     74    /// type must be a text type, for example text/xml.
     75    ///
     76    /// \param str
     77    /// Destination string.
     78    ///
     79    virtual void ResponseStartUtf8(string& str) = 0;
     80
     81    ///////////////////////////////////////////////////////////////////////////
     82    /// \brief
     83    /// Returns the end of the response as a UTF-8 string. The mime
     84    /// type must be a text type, for example text/xml.
     85    ///
     86    /// \param str
     87    /// Destination string.
     88    ///
     89    virtual void ResponseEndUtf8(string& str) = 0;
     90
     91    ///////////////////////////////////////////////////////////////////////////
     92    /// \brief
     93    /// Returns the start of the response body as a UTF-8 string. The mime
     94    /// type must be a text type, for example text/xml.
     95    ///
     96    /// \param str
     97    /// Destination string.
     98    ///
     99    virtual void BodyStartUtf8(string& str) = 0;
     100
     101    ///////////////////////////////////////////////////////////////////////////
     102    /// \brief
     103    /// Returns the end of the response body as a UTF-8 string. The mime
     104    /// type must be a text type, for example text/xml.
     105    ///
     106    /// \param str
     107    /// Destination string.
     108    ///
     109    virtual void BodyEndUtf8(string& str) = 0;
     110
     111    ///////////////////////////////////////////////////////////////////////////
     112    /// \brief
     113    /// Returns the contents of the header in this reader as a UTF-8 string.  The mime
     114    /// type must be a text type, for example text/xml.
     115    ///
     116    /// \param str
     117    /// Destination string.
     118    ///
     119    virtual void HeaderToStringUtf8(string& str) = 0;
     120
     121    ///////////////////////////////////////////////////////////////////////////
     122    /// \brief
     123    /// Returns the contents of the current record/feature in the reader as a UTF-8 string.  The mime
     124    /// type must be a text type, for example text/xml.
     125    ///
     126    /// \param str
     127    /// Destination string.
     128    ///
     129    virtual void CurrentToStringUtf8(string& str) = 0;
     130};
     131
     132}}}
     133
     134The proxy versions of these readers (MgProxyFeatureReader/MgProxyDataReader/MgProxySqlDataReader) provides implementations of these methods to return the various XML fragments that compose the default XML response.
     135
     136To explain what these new internal APIs are supposed to do, here's the current implementation of MgProxyDataReader::ToXml()
     137
     138{{{
     139
     140void MgProxyDataReader::ToXml(string &str)
     141{
     142    CHECKNULL((MgBatchPropertyCollection*)m_set, L"MgProxyDataReader.ToXml");
     143    CHECKNULL((MgPropertyDefinitionCollection*)m_propDefCol, L"MgProxyDataReader.ToXml");
     144
     145    // this XML follows the SelectAggregate-1.0.0.xsd schema
     146    str += "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
     147    str += "<PropertySet>";
     148    m_propDefCol->ToXml(str);
     149    str += "<Properties>";
     150    while ( this->ReadNext() )
     151    {
     152        Ptr<MgPropertyCollection> propCol = m_set->GetItem(m_currRecord-1);
     153        INT32 cnt = propCol->GetCount();
     154        if (propCol != NULL && cnt > 0)
     155        {
     156            str += "<PropertyCollection>";
     157            propCol->ToXml(str,false);
     158            str += "</PropertyCollection>";
     159        }
     160    }
     161    str += "</Properties>";
     162    str += "</PropertySet>";
     163}
     164
     165}}}
     166
     167This is how the method looks with the new internal APIs, which should give you an idea of what these methods actually return:
     168
     169{{{
     170
     171void MgProxyDataReader::ToXml(string &str)
     172{
     173    CHECKNULL((MgBatchPropertyCollection*)m_set, L"MgProxyDataReader.ToXml");
     174    CHECKNULL((MgPropertyDefinitionCollection*)m_propDefCol, L"MgProxyDataReader.ToXml");
     175
     176    // this XML follows the SelectAggregate-1.0.0.xsd schema
     177    ResponseStartUtf8(str);
     178    HeaderToStringUtf8(str);
     179    BodyStartUtf8(str);
     180    while ( this->ReadNext() )
     181    {
     182        CurrentToStringUtf8(str);
     183    }
     184    BodyEndUtf8(str);
     185    ResponseEndUtf8(str);
     186}
     187
     188}}}
     189
     190Thus each subclass of MgReader (only the proxy ones, server-side readers will never have these methods invoked) will have each method return their specific fragments of XML to assemble the final result.
     191
     192=== Streaming the reader results ===
     193
     194With these new internal APIs, we have the ability to take an MgReader and output its contents to the http output stream '''as we are iterating through the reader''', instead of having to buffer the entire reader contents up-front. To actually stream the results we employ the use of [http://en.wikipedia.org/wiki/Chunked_transfer_encoding HTTP chunked transfer encoding] to write out the reader content in chunks.
     195
     196A new MgHttpReaderStreamer class is used to do most of this streaming work:
     197
     198{{{
     199
     200/// <summary>
     201/// Purpose of this class is to provide a common base class for streaming out the contents of
     202/// an MgReader instance via chunked response encoding
     203/// </summary>
     204class MG_MAPAGENT_API MgHttpReaderStreamer : public MgGuardDisposable
     205{
     206public:
     207    //Performs the streaming of the reader contents
     208    void StreamResult();
     209    virtual ~MgHttpReaderStreamer();
     210
     211protected:
     212    MgHttpReaderStreamer(MgReader* reader, CREFSTRING format);
     213
     214    //Sets the web-server specific options to enable chunked encoding
     215    virtual void SetChunkedEncoding();
     216
     217    //Write out the specified chunk
     218    virtual void WriteChunk(const char* str, size_t length);
     219   
     220    virtual void Dispose() { delete this; }
     221
     222    //Sets web-server specific options to denote the end of the chunked response
     223    virtual void EndStream();
     224   
     225private:
     226    void ToJson(string& xmlString, string& jsonString);
     227    Ptr<MgReader> m_reader;
     228    STRING m_format;
     229};
     230
     231}}}
     232
     233This class has Apache (ApacheReaderStreamer), ISAPI (IsapiReaderStreamer) and CGI (CgiReaderStreamer) subclasses that handle their respective technology's way of outputting chunked http responses.
     234
     235The MgHttpReaderStreamer chunks the reader content in the following fashion (using MgFeatureReader as an example)
     236
     237{{{
     238
     239- First Chunk
     240   - Response Root Element (<FeatureSet>)
     241   - Header (<FDO class definition XML>)
     242   - Response Body Root Element (<Features>)
     243
     244- 0 ... n chunks, where n is the number of iterations of ReadNext()
     245   - Feature record (<Feature> ... </Feature>)
     246
     247- Last Chunk
     248   - Response Body Root Element End (</Features>)
     249   - Response Root Element End (</FeatureSet>)
     250
     251}}}
     252
     253For JSON, the process is the same with an extra step of converting the XML fragment to its JSON equivalent before writing the content out to the http response stream.
     254
     255This method is more memory efficient than the existing approach as the only internal buffering done is the current XML/JSON fragment that we want to write out. For comparison, the same query of the Sheboygan parcels through this implementation:
     256
     257 * XML: approx. 500KB
     258 * JSON: approx. 2MB
     259
     260For this particular test case, that's approximately '''100 times''' improved memory efficiency for XML and approximately '''560 times''' improved memory efficiency for JSON!
     261
     262== Implications ==
     263
     264Chunked response encoding is only supported in HTTP 1.1. Most (if not all) web browsers and other http clients have had support for HTTP 1.1 for a very long time, so protocol support is not an issue.
     265
     266The streamed responses puts newlines into the XML/JSON content whereas the original implementations do not. This is not an issue as XML and JSON are not (and should not) be sensitive to whitespace.
     267
     268This RFC does not cover 2 other areas in MapGuide that could really benefit from streamed HTTP responses:
     269
     270 * KML rendering
     271 * WFS GetFeature responses
     272
     273KML rendering internally buffers KML content at the Renderer/Stylization level. To facilitate true '''streamed''' KML content would require refactoring of the KmlRenderer and/or the Rendering/Stylization architecture. Supporting streamed KML responses would have to be the scope of another separate RFC.
     274
     275WFS GetFeature responses is handled by MapGuide's OGC templating engine. This templating engine suffers from the same problem of internally buffering the full WFS GetFeature response up-front. What's worse is that this internal buffering is backed by the MgByte class that has a hard-coded limit of 64MB. Although rare, WFS queries that result in over 64MB of buffered data will be truncated (see #1070). Supporting streamed WFS GetFeature responses is a monumental task in itself due to the design and use of the templating engine and would have to be the scope of a separate RFC.
     276
     277== Test Plan ==
     278
     279Exercise the affected operations against the Apache/ISAPI/CGI agents. Verify the XML/JSON content is the same for all 3 agents and is the matches the content as it was before this RFC.
     280
     281== Funding / Resources ==
     282
     283Community