wiki:MapGuideRfc130

Version 4 (modified by jng, 11 years ago) ( diff )

--

MapGuide RFC 130 - Streamed HTTP feature/data/SQL query responses

This page contains a change request (RFC) for the MapGuide Open Source project. More MapGuide RFCs can be found on the RFCs page.

Status

RFC Template Version(1.0)
Submission Date18 Dec 2012
Last Modified3 Jan 2013
AuthorJackie Ng
RFC Statusready for review
Implementation Status
Proposed Milestone2.5
Assigned PSC guide(s)(when determined)
Voting History(vote date)
+1
+0
-0
-1
no vote

Overview

This RFC proposes to add support for streaming feature/aggregate/SQL query results over the HTTP mapagent in a memory-efficient manner.

Motivation

The current implementations of SELECTFEATURES/SELECTAGGREGATES/EXECUTESQLQUERY mapagent operations are woefully inefficient in terms of memory use. Each one of these operations performs internal XML buffering of the entire MgFeatureReader/MgDataReader/MgSqlDataReader causing potentially large memory spikes for large unfiltered Feature Source queries over the mapagent due to these operations maintaining the whole internal XML document before writing out the http response. If the desired format is JSON, the memory usage is even worse as not only do these operations maintain an internal XML string buffer, but also an internal DOM of this buffer for the purposes of converting to JSON.

For example, a full SELECTFEATURES query of the Sheboygan Parcels (17k features) consumes the following amount of memory:

  • XML: ~50MB
  • JSON: 1.5GB (!!!)

And this is just for one request. If we want MapGuide to be a useful data platform, one that can support a theoretical map viewer that does full client-side rendering, our methods of querying data from it must be efficient in both performance and memory use. What we currently have in place is not acceptable.

Proposed Solution

New internal MgReader APIs

We introduce a series of new internal APIs to MgReader, which will allow us to return various parts of the default XML response instead of having to get the entire blob via ToXml():

class MG_PLATFORMBASE_API MgReader : public MgSerializable
{

INTERNAL_API:
    ///////////////////////////////////////////////////////////////////////////
    /// \brief
    /// Returns the starting element name as a UTF-8 string. The mime
    /// type must be a text type, for example text/xml.
    ///
    /// \param str
    /// Destination string.
    ///
    virtual string GetResponseElementName() = 0;

    ///////////////////////////////////////////////////////////////////////////
    /// \brief
    /// Returns the body starting element name as a UTF-8 string. The mime
    /// type must be a text type, for example text/xml.
    ///
    /// \param str
    /// Destination string.
	///
    virtual string GetBodyElementName() = 0;

    ///////////////////////////////////////////////////////////////////////////
    /// \brief
    /// Returns the start of the response as a UTF-8 string. The mime
    /// type must be a text type, for example text/xml.
    ///
    /// \param str
    /// Destination string.
    ///
    virtual void ResponseStartUtf8(string& str) = 0;

    ///////////////////////////////////////////////////////////////////////////
    /// \brief
    /// Returns the end of the response as a UTF-8 string. The mime
    /// type must be a text type, for example text/xml.
    ///
    /// \param str
    /// Destination string.
    ///
    virtual void ResponseEndUtf8(string& str) = 0;

    ///////////////////////////////////////////////////////////////////////////
    /// \brief
    /// Returns the start of the response body as a UTF-8 string. The mime
    /// type must be a text type, for example text/xml.
    ///
    /// \param str
    /// Destination string.
    ///
    virtual void BodyStartUtf8(string& str) = 0;

    ///////////////////////////////////////////////////////////////////////////
    /// \brief
    /// Returns the end of the response body as a UTF-8 string. The mime
    /// type must be a text type, for example text/xml.
    ///
    /// \param str
    /// Destination string.
    ///
    virtual void BodyEndUtf8(string& str) = 0;

    ///////////////////////////////////////////////////////////////////////////
    /// \brief
    /// Returns the contents of the header in this reader as a UTF-8 string.  The mime
    /// type must be a text type, for example text/xml.
    ///
    /// \param str
    /// Destination string.
    ///
    virtual void HeaderToStringUtf8(string& str) = 0;

    ///////////////////////////////////////////////////////////////////////////
    /// \brief
    /// Returns the contents of the current record/feature in the reader as a UTF-8 string.  The mime
    /// type must be a text type, for example text/xml.
    ///
    /// \param str
    /// Destination string.
    ///
    virtual void CurrentToStringUtf8(string& str) = 0;
};

The proxy versions of these readers (MgProxyFeatureReader/MgProxyDataReader/MgProxySqlDataReader) provides implementations of these methods to return the various XML fragments that compose the default XML response. Server implementations will never have these methods called, so they will throw an exception if invoked.

To explain what these new internal APIs are supposed to do, here's the current implementation of MgProxyDataReader::ToXml()

void MgProxyDataReader::ToXml(string &str)
{
    CHECKNULL((MgBatchPropertyCollection*)m_set, L"MgProxyDataReader.ToXml");
    CHECKNULL((MgPropertyDefinitionCollection*)m_propDefCol, L"MgProxyDataReader.ToXml");

    // this XML follows the SelectAggregate-1.0.0.xsd schema
    str += "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
    str += "<PropertySet>";
    m_propDefCol->ToXml(str);
    str += "<Properties>";
    while ( this->ReadNext() )
    {
        Ptr<MgPropertyCollection> propCol = m_set->GetItem(m_currRecord-1);
        INT32 cnt = propCol->GetCount();
        if (propCol != NULL && cnt > 0)
        {
            str += "<PropertyCollection>";
            propCol->ToXml(str,false);
            str += "</PropertyCollection>";
        }
    }
    str += "</Properties>";
    str += "</PropertySet>";
}

This is how the method looks with the new internal APIs, which should give you an idea of what these methods actually return:

void MgProxyDataReader::ToXml(string &str)
{
    CHECKNULL((MgBatchPropertyCollection*)m_set, L"MgProxyDataReader.ToXml");
    CHECKNULL((MgPropertyDefinitionCollection*)m_propDefCol, L"MgProxyDataReader.ToXml");

    // this XML follows the SelectAggregate-1.0.0.xsd schema
    ResponseStartUtf8(str);
    HeaderToStringUtf8(str);
    BodyStartUtf8(str);
    while ( this->ReadNext() )
    {
        CurrentToStringUtf8(str);
    }
    BodyEndUtf8(str);
    ResponseEndUtf8(str);
}

Thus each subclass of MgReader (only the proxy ones, server-side readers will never have these methods invoked) will have each method return their specific fragments of XML to assemble the final result.

Streaming the reader results

With these new internal APIs, we have the ability to take an MgReader and output its contents to the http output stream as we are iterating through the reader, instead of having to buffer the entire reader contents up-front. To actually stream the results we employ the use of HTTP chunked transfer encoding to write out the reader content in chunks.

A new MgHttpReaderStreamer class is used to do most of this streaming work:

/// <summary>
/// Purpose of this class is to provide a common base class for streaming out the contents of
/// an MgReader instance via chunked response encoding
/// </summary>
class MG_MAPAGENT_API MgHttpReaderStreamer : public MgGuardDisposable
{
public:
    //Performs the streaming of the reader contents
    void StreamResult();
    virtual ~MgHttpReaderStreamer();

protected:
    MgHttpReaderStreamer(MgReader* reader, CREFSTRING format);

    //Sets the web-server specific options to enable chunked encoding
    virtual void SetChunkedEncoding();

    //Write out the specified chunk
    virtual void WriteChunk(const char* str, size_t length);
    
    virtual void Dispose() { delete this; }

    //Sets web-server specific options to denote the end of the chunked response
    virtual void EndStream();
    
private:
    void ToJson(string& xmlString, string& jsonString);
    Ptr<MgReader> m_reader;
    STRING m_format;
};

This class has Apache (ApacheReaderStreamer), ISAPI (IsapiReaderStreamer) and CGI (CgiReaderStreamer) subclasses that handle their respective technology's way of outputting chunked http responses.

The MgHttpReaderStreamer chunks the reader content in the following fashion (using MgFeatureReader as an example)

- First Chunk
   - Response Root Element (<FeatureSet>)
   - Header (<FDO class definition XML>)
   - Response Body Root Element (<Features>)

- 0 ... n chunks, where n is the number of iterations of ReadNext()
   - Feature record (<Feature> ... </Feature>)

- Last Chunk
   - Response Body Root Element End (</Features>)
   - Response Root Element End (</FeatureSet>)

For JSON, the process is the same with an extra step of converting the XML fragment to its JSON equivalent before writing the content out to the http response stream.

This method is more memory efficient than the existing approach as the only internal buffering done is the current XML/JSON fragment that we want to write out. For comparison, the same query of the Sheboygan parcels through this implementation:

  • XML: approx. 500KB
  • JSON: approx. 2MB

For this particular test case, that's approximately 100 times improved memory efficiency for XML and approximately 560 times improved memory efficiency for JSON!

All of the above changes have been implemented in this sandbox. Upon adoption of this RFC, the changes in this sandbox will be merged back into the trunk code stream.

Implications

Chunked response encoding is only supported in HTTP 1.1. Most (if not all) web browsers and other http clients have had support for HTTP 1.1 for a very long time, so protocol support is not an issue.

The streamed responses puts newlines into the XML/JSON content whereas the original implementations do not. This is not an issue as XML and JSON are not (and should not) be sensitive to whitespace.

This RFC does not cover 2 other areas in MapGuide that could really benefit from streamed HTTP responses:

KML rendering internally buffers KML content at the Renderer/Stylization level. To facilitate true streamed KML content would require refactoring of the KmlRenderer and/or the Rendering/Stylization architecture. Supporting streamed KML responses would have to be the scope of another separate RFC if we want to go down this path.

WFS GetFeature responses is handled by MapGuide's OGC templating engine. This templating engine suffers from the same problem of internally buffering the full WFS GetFeature response up-front. What's worse is that this internal buffering is backed by the MgByte class that has a hard-coded limit of 64MB. Although rare, WFS queries that result in over 64MB of buffered data will be truncated (see #1070). Supporting streamed WFS GetFeature responses is a monumental task in itself due to the design and use of the templating engine and would have to be the scope of a separate RFC if we ever want to go down this path.

Test Plan

Exercise the affected operations against the Apache/ISAPI/CGI agents. Verify the XML/JSON content is the same for all 3 agents and matches the content as it was before this RFC.

Funding / Resources

Community

Note: See TracWiki for help on using the wiki.