Opened 8 years ago

Closed 3 years ago

#5228 closed defect (wontfix)

Loading a plugin on windows retains a library handle reference which causes memory problems

Reported by: damiandixon Owned by: warmerdam
Priority: low Milestone: closed_because_of_github_migration
Component: default Version: 1.10.0
Severity: minor Keywords:
Cc:

Description (last modified by damiandixon)

Loading a plugin on windows retains a library handle reference which causes memory retention problems (potential multiple initialization).

The function CPLGetSymbol on windows calls LoadLibrary?. When LoadLibrary? is called the DLL reference count is incremented.

When I call:

GDALDestroyDriverManager();

I would expect everything to be cleaned up and when I unload my DLL which is using GDAL I would expect the GDAL DLL to also unload. The GDAL DLL does not unload.

I have traced the issue down to the loading of plugins. If a plugin is successfully loaded the GDAL DLL remains in process memory.

I think that CPLGetSymbol should be split into two methods:

The first returns a void* ptr to a Library.
The second takes the ptr and a method name and returns a ptr to the method.

A new method should be introduced to free a library.

In GDALDriverManager::AutoLoadDrivers?() a reference needs to be taken to the library loaded. A reference to the DLL/so also needs to be taken in OGRSFDriverRegistrar::AutoLoadDrivers?().

In GDALDriverManager::~GDALDriverManager() and OGRCleanupAll() the reference to the libraries loaded need to be used to free the libraries.

The approach in CPLGetSymbol is likely to be an issue on other platforms where GDAL is used via a loaded DLL/so.

I'm going to look at doing a local fix but that's unlikely to be acceptable as a patch as I am currently only using GDAL on Windows.

Change History (9)

comment:1 Changed 8 years ago by damiandixon

Description: modified (diff)

comment:2 Changed 8 years ago by damiandixon

Description: modified (diff)

comment:3 Changed 8 years ago by damiandixon

Description: modified (diff)

Rather than initially using LoadLibrary? it might be more sensible to use GetModuleHandle first then fall back to LoadLibrary?.

This would ensure that the library's reference count is only incremented once on first load of the library.

This is I think only a stop gap as I still need away of unloading the shared libraries.

The ideal solution is the match the LoadLibrary? calls to FreeLibrary? calls. From experience this is more than likely going to affect non-windows platforms.

comment:4 in reply to:  3 Changed 8 years ago by damiandixon

Replying to damiandixon:

GetModuleHandleEx? is a better method to use.

Rather than initially using LoadLibrary? it might be more sensible to use GetModuleHandle first then fall back to LoadLibrary?.

This would ensure that the library's reference count is only incremented once on first load of the library.

This is I think only a stop gap as I still need away of unloading the shared libraries.

The ideal solution is the match the LoadLibrary? calls to FreeLibrary? calls. From experience this is more than likely going to affect non-windows platforms.

comment:5 Changed 8 years ago by warmerdam

Damian,

Modifying CPLGetSymbol() to try GetModuleHandle?() or GetModuleHandleEx?() first sounds like a good idea.

Do you think that calling LoadLibrary?() for an already loaded library actually consumes some sort of new resource as opposed to just incrementing the reference count of the loaded DLL?

Ian mentioned you are running out of Windows TLS slots. Is this related to the following?

http://support.microsoft.com/kb/118816

In any event, if you can try the GetModuleHandle?() solution at your end and submit a patch if it helps that would be great.

comment:6 Changed 8 years ago by damiandixon

Severity: majornormal

Frank,

Thanks for getting back to me. Since Ian spoke to you I now I however think this ticket might now be moot.

The TLSAlloc issue arose because we initially thought that the GDAL library was unloaded when we unloaded our wrapper plugin.

When we loaded our wrapper plugin we would initialize GDAL/OGR on first use and when the last instance of the wrapper plugin was unloaded we would call the methods to clean up GDAL/OGR.

This did not work as expected as the LoadLibrary/FreeLibrary? calls did not match leaving GDAL/OGR and its plugins in memory.

Re-initializing GDAL/OGR ended up calling TLSAlloc() and after 1058 loads/unloads Windows reported that it had run out of TLS slots. I can't see why its running out of TLS slots as the cleanup code in GDAL/OGR is being correctly called. I can only guess that there is something buried in the Windows code that does not completely release the TLS slot until the DLL has been unloaded. This is not an area I have much experience in as we tend to go about this in a different way because we port to systems that do not support TLS.

Since Monday I have been attempting to track down a random crash that now appears to be in one of the 3rd party closed source libraries that the GDAL/OGR plugins use.

I am in the process of re-writing the code that uses GDAL/OGR to lock the GDAL/OGR DLL in memory.

For the record I modified CPLGetSymbol as follows:

    // DBLD Begin
    CPLMutexHolderD( &hCPLMutex );

    HMODULE     module ;
    GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT, pszLibrary, &module);
    pLibrary = module;
    if ( pLibrary == NULL)
    {
        pLibrary = LoadLibrary(pszLibrary);
        if( pLibrary <= (void*)HINSTANCE_ERROR ) 
        {
            LPVOID      lpMsgBuf = NULL;
            int         nLastError = GetLastError();
        
            FormatMessage( FORMAT_MESSAGE_ALLOCATE_BUFFER 
                           | FORMAT_MESSAGE_FROM_SYSTEM
                           | FORMAT_MESSAGE_IGNORE_INSERTS,
                           NULL, nLastError,
                           MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), 
                           (LPTSTR) &lpMsgBuf, 0, NULL );
 
            CPLError( CE_Failure, CPLE_AppDefined,
                      "Can't load requested DLL: %s\n%d: %s", 
                      pszLibrary, nLastError, (const char *) lpMsgBuf );
            return NULL;
        }

        // my code to track loaded libraries - I have a function that calls FreeLibrary when I've finished.
        if (gModulesLoaded.find(pszLibrary) == gModulesLoaded.end())
        {
          gModulesLoaded[pszLibrary] = pLibrary;
        }
        else
        {
          CPLError( CE_Warning, CPLE_AppDefined,
                      "DLL Already loaded: %s", 
                      pszLibrary);
        }
    }
    // DBLD End

This works fine for me up until I started to get reports of random crashes from my nightly builds.

The problem I have is that the crash would only occur after 3 hours of running other tests un-related to GDAL. Initially the crash was only with 64bit builds then the 32bit builds started to fail.

Running the test group on its own that crashed did not reproduce the crash.

I could not get Purify to report any problems partly because we have a 3rd party library that purify just will not work with and neither will AQtime.

I finally got a series of consistent crashes under Boundschecker. I'm running all the tests I can using valgrind on Linux but don't expect to see the results of the run for 2 to 3 days.

Until I have completed the code changes and rerun the tests I will not know for definite if this has solved the problems I am now seeing.

If the changes I am making to my code work then this ticket will be moot.

I will keep this ticket up to date.

comment:7 Changed 8 years ago by damiandixon

Mainly for myself: While looking at pining the GDAL DLL in memory on windows I found that the DllMain? method does clean up on process detach and attach.

comment:8 Changed 8 years ago by damiandixon

Priority: normallow
Severity: normalminor

FILE: gdaldllmain.cpp

CHANGE: When a process attaches for the first time we use the address of the DllMain? function to pin the DLL in memory. This means that my DLLs may unload but the GDAL/OGR DLL remains in memory.

This does mean that the methods OGRRegisterAll and GDALAllRegister can only be called once. The problem with this is that we can't tell externally if GDAL/OGR has been initialized so I added a couple of flags to stop the drivers being initialized more than once.

The application code needs to be very careful with setting of callbacks in general because when the application DLL is unloaded the address of the callback becomes invalid.

extern "C" BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved)
{
  HMODULE  hMod = NULL;

  switch (fdwReason)
  {
  case DLL_PROCESS_ATTACH:
    //
    // pin the DLL into memory
    GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS | GET_MODULE_HANDLE_EX_FLAG_PIN, (LPCTSTR)DllMain, &hMod);

    break;

FILE: ogrregisterall.cpp

CHANGE: I added a flag to make sure that if I called OGRRegsiterAll more than once it would not re-register the drivers.

// DBLD Start
static bool m_gdalOgrRegistered = false;
static void *hCPLMutex = NULL;
// DBLD end

/************************************************************************/
/*                           OGRRegisterAll()                           */
/************************************************************************/

void OGRRegisterAll()
{
    // DBLD start
    CPLMutexHolderD( &hCPLMutex );

    if (m_gdalOgrRegistered)
      return ;
    m_gdalOgrRegistered = true;

  // DBLD end

FILE: gdalallregister.cpp

CHANGE: Added a flag so that if I called GDALAllRegister more than once it would not re-register the drivers.

// DBLD start
# include "cpl_multiproc.h"

extern bool m_gdalOgrRegistered;
extern void *hCPLMutex;
// DBLD end

void CPL_STDCALL GDALAllRegister()

{
  // DBLD start
    CPLMutexHolderD( &hCPLMutex );

    if (m_gdalOgrRegistered)
      return ;
    m_gdalOgrRegistered = true;

  // DBLD end

The cleanup methods should probably unset the flags.

comment:9 Changed 3 years ago by Even Rouault

Milestone: closed_because_of_github_migration
Resolution: wontfix
Status: newclosed

This ticket has been automatically closed because Trac is no longer used for GDAL bug tracking, since the project has migrated to GitHub?. If you believe this ticket is still valid, you may file it to https://github.com/OSGeo/gdal/issues if it is not already reported there.

Note: See TracTickets for help on using tickets.