Opened 8 years ago

Closed 5 years ago

#6548 closed enhancement (wontfix)

Need a way of closing pg connections created in the PostGISRasterDriver

Reported by: glindsay Owned by: warmerdam
Priority: normal Milestone: closed_because_of_github_migration
Component: GDAL_Raster Version: unspecified
Severity: normal Keywords:
Cc:

Description

Hello,

We have run into an issue with the current implementation of the pg connection pooling in the PostGISRasterDriver.

We are running mapserver in a long running mapserver web service, which loads rasters from a postgres/postgis database (for display in Openlayers). Due to the high number of wms requests, we spawn a new thread for each request to generate the tile images (as to not starve the web request worker threads).

The side effect of this approach is that due to the way that the PostGISRasterDriver pools connections (keyed on db, host, port, user and thread id) it continually opens and stores new connections (as the thread id is changing) until the 'Too many clients' pg error is received.

The workarounds that we have used to manage this are:

  1. run each request synchronously in the web worker threads (assuming threads are reused, with some performance impacts)
  2. recycle the worker process regularly to clear built up connections (with resulting loading performance impact)
  3. install something like pgbouncer on the pg side to manage connections (with lost connection side effects)

I would like to please request that an enhancement is made to the PostGISRasterDriver so that it is possible to close connections made on a specific thread. This would mean that each thread could close the created connections when finished (ideally ultimately through mapserver in a processing directive in a map file).

Thoughts?

Change History (7)

comment:1 by glindsay, 8 years ago

E.g something like this could be added to the postgisraterdriver.cpp:

void PostGISRasterDriver::CloseConnection(const char * pszDbnameIn, const char * pszHostIn, const char * pszPortIn, const char * pszUserIn)
{
    if( pszHostIn == NULL ) pszHostIn = "(null)";
    if( pszPortIn == NULL ) pszPortIn = "(null)";
    if( pszUserIn == NULL ) pszUserIn = "(null)";
    CPLString osKey = pszDbnameIn;
    osKey += "-";
    osKey += pszHostIn;
    osKey += "-";
    osKey += pszPortIn;
    osKey += "-";
    osKey += pszUserIn;
    osKey += "-";
    osKey += CPLSPrintf(CPL_FRMT_GIB, CPLGetPID());
  
    CPLMutexHolderD(&hMutex);
    std::map<CPLString, PGconn*>::iterator oIter = oMapConnection.find(osKey);
    if( oIter != oMapConnection.end() )
    {
        PQfinish(oIter->second); // close connection
        oMapConnection.erase(oIter); // remove entry
    }
}

comment:2 by Even Rouault, 8 years ago

I'm not sure this is the appropriate solution as we wouldn't want MapServer to aware of CloseConnection() as it should remain a driver implementation detail.

A better solution would be for the driver to have a better caching mechanism that takes into account the connection id (dbname,host,port,user), the thread id, and a maximum number of connections. Basically you would have 2 sets: a set "Salive" of active connections indexed on (dbname,host,port,user,thread_id) with a refcount, and a set "Scached" of unused cached connections indexed on (dbname,host,port,user) When a openconnection for a (dbname,host,port,user) arrives, the driver would check in Salive if there's a connection matching (dbname,host,port,user,thread_id), if so it returns it and increases the refcount. If doesn't exist, it checks in Scached : if there's one there, it will transfer it to Salive. And if not, it really opens a new connection and put it in Salive. When the GDAL dataset is closed, the refcount of the connection is decremented. If it goes to 0, it is transfered to Scached. If Scached grows above a given number (let's say 10), the oldest used cached connection is completely closed.

in reply to:  2 ; comment:3 by glindsay, 8 years ago

Replying to rouault:

I'm not sure this is the appropriate solution as we wouldn't want MapServer to aware of CloseConnection() as it should remain a driver implementation detail.

A better solution would be for the driver to have a better caching mechanism that takes into account the connection id (dbname,host,port,user), the thread id, and a maximum number of connections. Basically you would have 2 sets: a set "Salive" of active connections indexed on (dbname,host,port,user,thread_id) with a refcount, and a set "Scached" of unused cached connections indexed on (dbname,host,port,user) When a openconnection for a (dbname,host,port,user) arrives, the driver would check in Salive if there's a connection matching (dbname,host,port,user,thread_id), if so it returns it and increases the refcount. If doesn't exist, it checks in Scached : if there's one there, it will transfer it to Salive. And if not, it really opens a new connection and put it in Salive. When the GDAL dataset is closed, the refcount of the connection is decremented. If it goes to 0, it is transfered to Scached. If Scached grows above a given number (let's say 10), the oldest used cached connection is completely closed.

Yep I completely agree that mapserver should not be aware of any driver specific implementation details.

It would be better if you could configure the connection lifestyle explictly e.g transient, singleton or thread (e.g using a reference counter), so that the end user has complete control over when connections are open and closed. Maybe a new 'lifestyle' option could be included in the 'PG:' connection string (which would be consistent with the way that the 'mode' option is handled).

For reference, the Map Server Connection Pooling (mappool.c) allows different lifestyles to be configured and uses a reference counter to close off shared connections.

Last edited 8 years ago by glindsay (previous) (diff)

in reply to:  3 comment:4 by glindsay, 8 years ago

Replying to myself:

It would be better if you could configure the connection lifestyle explictly e.g transient, singleton or thread (e.g using a reference counter), so that the end user has complete control over when connections are open and closed.

After experimenting with the code, the major issue I found in trying implement a lifestyle type paradigm is that datasets are created and closed sequentially and there appears no way of knowing when a thread has finished processing requests. Eg if you increment a reference counter when a connection is opened and then decrement on close (through the dataset destructor) the connections never get shared as the reference counter never goes above 1.

Last edited 8 years ago by glindsay (previous) (diff)

comment:5 by Even Rouault, 8 years ago

The sharing of the same connection by 2 active datasets in a same thread is probably something that is not hit in your use case, and perhaps not so common. Perhaps just having a pull of inactive connections that are ready to be recycled should be enough.

comment:6 by glindsay, 8 years ago

Thanks for your help, rouault. The use case I was testing was a mapserver request with multiple layers with postgis raster data. Ill try your suggestion.

comment:7 by Even Rouault, 5 years ago

Milestone: closed_because_of_github_migration
Resolution: wontfix
Status: newclosed

This ticket has been automatically closed because Trac is no longer used for GDAL bug tracking, since the project has migrated to GitHub. If you believe this ticket is still valid, you may file it to https://github.com/OSGeo/gdal/issues if it is not already reported there.

Note: See TracTickets for help on using tickets.