wiki:FDORfc52

FDO RFC 52 - Convenience C++ API wrapper for FDO

This page contains a request for comments document (RFC) for the FDO Open Source project. More FDO RFCs can be found on the RFCs page.

Status

RFC Template Version(1.1)
Submission Date2010-08-08
Last ModifiedTraian Stanev 2010-08-08
AuthorTraian Stanev
RFC Statusdraft
Implementation Statusunder development
Proposed Milestone(e.g. 1.1, 1.3)
Assigned PSC guide(s)(when determined)
Voting History(vote date)
+1
+0
-0
-1

Overview

This RFC proposes a convience wrapper API for FDO. The basic features of the API are:

  • Mostly procedural API (unlike FDO Core object oriented, command based structure). This significantly reduces object lifetime issues, removes the need of refcounting and eases possible managed wrapper, due to only one single object controlling database connection lifetime.
  • Provides shortcuts for commonly used functionality, like spatial query, fetching extents, feature count
  • Automatic type conversion (a.k.a duck typing). Unlike core FDO, the API converts automatically between comaptible type (No more need to switch-case on FdoByte, !FdoInt16, !FdoInt32, !FdoInt64, FdoDecimal, just to get an integer).
  • Thread safe connection pooling/caching. This is a common feature that often developers have to implement from scratch.
  • Possibility to provide alternative backend to the API (for example, OGR backend).
  • Possibility to implement common features, like coordinate system transformations, in the wrapper layer, in a single common piece of code for all provider backends.
  • Automatic backend data source resolution, similar to OGR (i.e. if the connection string is a path to an SHP file, the SHP provider will automatically be loaded and used)

High Level Architecture

There are four main objects in the API, reflecting the database-oriented nature of most FDO providers:

  • Database (top level object)
    • Table (a Database can have several Tables)
    • Row (a Table consists of many Rows)
    • Value (a Row consists of many column Values)

The Database is the top level object, whose lifetime controls the lifetime of all other objects. The API user only directly has to manage the lifetime of the Database object, which can be obtained from and released to the thread safe connection pool automatically using a RAII-style DbHandle object, to completely avoid manual object disposal.

Examples

The detailed API is available in header files attached to this document. Here are some examples which illustrate the basics.

Example 1: Read and print the contents of a data source

This code sample iterates over all features of all tables of a database and prints out the value for each column. Note that automatic data type conversion to string and the simple access by index for each column (via operator[]). Column value access by name is also possible using operator[].

void PrintFile(const wchar_t* path)
{
    printf("Data source: %ls \n", path);

	// Get a connection to from the pool
    DbHandle srcdb = DbPool::GetConnection(path);

    if (!srcdb)
        return;

    //for all feature classes
    for (int q=0; q<srcdb->Count(); q++)
    {
        Table& srctbl = (*srcdb)[q]; 

        //table name
        printf("Table: %ls\n", srctbl.Def().Name());

        //show the overall extent
        double ext[4];
        srctbl.GetExtent(ext);
        printf("Bounds: %.8g, %.8g, %.8g, %.8g\n", ext[0], ext[1], ext[2], ext[3]);

        //number of features
        printf("Total feature count: %lld\n", srctbl.GetRowCount());
      
        //just run through a full table select
        srctbl.Query();

        while (srctbl.Next())
        {
            const Row& r = srctbl.At();

            printf("\tFeature: %lld\n", r.ID());

            for (int j=0; j<r.Def().ColumnCount(); j++)
            {
                //skip the fid which we printed above
                if (j == r.Def().IndexOfID()) 
                    continue;

                printf("\t\t%ls :\t%ls\n", r.Def()[j].Name(), r[j].AsString());
            }
        }

        srctbl.EndQuery(); //not strictly needed
    }
}

Example 2: Bulk copy from one data source to another (e.g. SHP to SQLite conversion)

This example copies verbatim the source data into the destination file. Note that all that's needed is to call Insert() on the destination with the source row. The insert call takes a flag indicating whether to preserve the column FID or not, and optionally a pointer which is filled with the FID of the newly created feature in the target data store.

void ConvertFDOToFDO(const wchar_t* src, const wchar_t* dst)
{
    printf("%ls ---> %ls\n\n", src, dst);

    //Create target data store
    if (!DatabaseFdo::Create(dst))
        return;

    //open source and target connections
    DbHandle srcdb = DbPool::GetConnection(src);
    DbHandle dstdb = DbPool::GetConnection(dst);

    if (!srcdb)
        return;

    if (!dstdb)
        return;

    //copy the schema and coord sys defs
    dstdb->SetSchema(srcdb->Def());

    //for all feature classes
    for (int q=0; q<dstdb->Count(); q++)
    {
        Table& dsttbl = (*dstdb)[q]; //get destination table
        Table& srctbl = *(*srcdb)[dsttbl.Def().Name()]; //get corresponding source table by name
      
        int count = 0;

        //basic "select *" kind of query
        srctbl.Query();
        //double bbox[4] = { 0, 0, 10000, 10000 };
        //srctbl.Query(bbox); //query with bounding box

        while (srctbl.Next())
        {
            //directly insert source row into target table
            //without any transformation
            dsttbl.Insert(srctbl.At(), NULL, true);

            count++;

            if (count % 10000 == 0) printf ("# processed: %d\n", count);
        }

        srctbl.EndQuery();

        printf ("\n\nTotal feature count : %d\n", count);
    }
}

Performance Implications

As every wrapper, there will be some performance overhead to using this wrapper instead of FDO directly. For simple queries accessing multiple column, this overhead is not very large, but it can be significant in cases where only one or two columns out of a very wide row are needed by the caller. This is fundamentally because the wrapper API always pre-fills all column values before returning a row -- something that can be fixed at the price of adding code complexity.

With OGR as backend, the performance overhead is very small, due to the proposed wrapper mapping almost 1:1 to the underlying OGR API. The main overhead with OGR is geometry conversion from WKB to FGF format.

The above weakness (pre-filling of the row) has the potential of becoming a strength if ever a direct backend is implemented for sqlite for example. Under normal use cases, using FDO requires 2*N virtual function calls to retrieve N column values of a row, while the wrapper API can do the same with a single virtual function call. For very fast backends this can make a huge difference, in particular if column values are accessed by index (and not by name).

Attachments

Attached is a rough implementation of the wrapper which supports the examples shown above.

For all base API definitions, refer to BaseWrap.h. Working backends are included for both FDO (FdoWrap.h/cpp) and OGR (OgrWrap.h/cpp).

The FDO backend recognizes SHP, SDF and SQLite connections only. The OGR backend recognizes all OGR backends, but does not yet implement datastore creation. It does implement insert/update to existing data stores. The connection pooling is implelmented in DbPool.h/cpp.

Funding/Resources

This RFC is currently only a request for discussion and feedback on the design of the proposed API.

Last modified 7 years ago Last modified on Aug 8, 2010 7:37:00 PM

Attachments (1)

Download all attachments as: .zip