Opened 7 years ago

Closed 6 years ago

#5747 closed defect (fixed)

crash latest trunk, still looks to be locale related

Reported by: vindoctor Owned by: warmerdam
Priority: highest Milestone: 2.1.0
Component: GDAL_Raster Version: svn-trunk
Severity: blocker Keywords:
Cc:

Description

I'm not sure if this is a proj4 issue or a gdal issue... As per the instruction on my last ticket, the latest trunk was thought to take care of the locale issue. I'm not sure if you need more info than what I have below or maybe I can add a breakpoint on something to help narrow down the cause?

msvcr110.dll!malloc(unsigned int size=9371648) Line 91 C

gdal200.dll!CPLMalloc(unsigned int nSize=7) Line 136 C++ gdal200.dll!CPLStrdup(const char * pszString=0x37679a08) Line 255 C++ gdal200.dll!OGR_SRSNode::Clone() Line 353 C++ gdal200.dll!OGR_SRSNode::Clone() Line 357 C++ gdal200.dll!OGR_SRSNode::Clone() Line 357 C++ gdal200.dll!OGRSpatialReference::Clone() Line 533 C++ gdal200.dll!OGRProj4CT::InitializeNoLock?(OGRSpatialReference * poSourceIn=0x254ef1cc, OGRSpatialReference * poTargetIn=0x254ef19c) Line 569 C++ msvcr110.dll!free(void * pBlock=0x316669d0) Line 51 C gdal200.dll!CPLLocaleC::CPLLocaleC() Line 2411 C++ gdal200.dll!OGRProj4CT::OGRProj4CT() Line 484 C++ gdal200.dll!_whirlpool_block_mmx() C++

gdal200.dll!GDALCreateSimilarGenImgProjTransformer(void * hTransformArg=0x2cab4900, double dfRatioX=1.0000000000000000, double dfRatioY=1.0000000000000000) Line 912 C++

gdal200.dll!GDALCloneTransformer(void * pTransformArg=0x2cab4900) Line 3175 C++ gdal200.dll!GWKRun(GDALWarpKernel * poWK=0x24448b1c, const char * pszFuncName=0x2474ff08, void (void *) * pfnFunc=0x2474ff1c) Line 259 C++ gdal200.dll!GWKGeneralCase(GDALWarpKernel * poWK=0x254ef3f0) Line 3683 C++ gdal200.dll!GDALWarpKernel::PerformWarp?() Line 919 C++ gdal200.dll!GDALWarpOperation::WarpRegionToBuffer?(int nDstXOff=0, int nDstYOff=32, int nDstXSize=64, int nDstYSize=32, void * pDataBuf=0x31819bb8, GDALDataType eBufDataType=GDT_Byte, int nSrcXOff=612, int nSrcYOff=8264, int nSrcXSize=345, int nSrcYSize=255, int nSrcXExtraSize=28, int nSrcYExtraSize=28, double dfProgressBase=0.25000000000000000, double dfProgressScale=0.12500000000000000) Line 1834 C++ msvcr110.dll!free(void * pBlock=0x9999999a) Line 51 C gdal200.dll!CSLFetchNameValue(char * * papszStrList=0x00000000, const char * pszName=0x00000000) Line 1569 C++

Change History (15)

comment:1 Changed 7 years ago by Even Rouault

At first sight, that crash doesn't seem to involve PROJ.4 and indeed there was still a potential race in CPLLocaleC::CPLLocaleC()

trunk r27997 "CPLsetlocale(): return a string that is thread-locale storage to avoid potential race in CPLLocaleC::CPLLocaleC() (#5747)"

comment:2 Changed 7 years ago by vindoctor

Thanks for helping. I got the latest and we crash in these 3 places below. I did validate the r27997 change is in what was compiled.

Crashed here twice

msvcr110.dll!strncmp(unsigned char * str1=0x10530a39, unsigned char * str2=0x00000004, unsigned long count=812181996) Line 106 Unknown

gdal200.dll!_pj_param() C gdal200.dll!_pj_init_ctx() C gdal200.dll!_pj_malloc() C msvcr110.dll!_getptd_noexit() Line 312 C msvcr110.dll!_errno() Line 281 C gdal200.dll!_CPLStrtodDelim() C++ gdal200.dll!_CPLAtof() C++ kernel32.dll!755814ad() Unknown [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll] msvcr110.dll!free(void * pBlock) Line 51 C gdal200.dll!OGRProj4CT::OGRProj4CT(void) C++ gdal200.dll!_whirlpool_block_mmx() C++

Crashed here twice

msvcr110.dll!_unlock(int locknum) Line 366 C msvcr110.dll!setlocale(int _category=7602176, const char * _locale=0x0ece7c2f) Line 119 C msvcr110.dll!malloc(unsigned int size=7602176) Line 91 C 00000001() Unknown msvcr110.dll!malloc(unsigned int size=266923474) Line 91 C gdal200.dll!_pj_malloc() C msvcr110.dll!_getptd_noexit() Line 312 C msvcr110.dll!_errno() Line 281 C gdal200.dll!_CPLStrtodDelim() C++ gdal200.dll!_CPLAtof() C++ kernel32.dll!755814ad() Unknown msvcr110.dll!free(void * pBlock) Line 51 C

gdal200.dll!OGRProj4CT::OGRProj4CT(void) C++

gdal200.dll!_whirlpool_block_mmx() C++

Crashed here twice

ntdll.dll!7774e725() Unknown

[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll] ntdll.dll!7774f659() Unknown ntdll.dll!776d57ce() Unknown ntdll.dll!776b3cfe() Unknown msvcr110.dll!malloc(unsigned int size=5111808) Line 91 C gdal200.dll!_CPLMalloc() C++ gdal200.dll!_CPLStrdup() C++ gdal200.dll!OGR_SRSNode::importFromWkt(char * *,int,int *) C++ gdal200.dll!OGR_SRSNode::importFromWkt(char * *,int,int *) C++ gdal200.dll!OGR_SRSNode::importFromWkt(char * *,int,int *) C++ gdal200.dll!OGR_SRSNode::importFromWkt(char * *,int,int *) C++ gdal200.dll!OGR_SRSNode::importFromWkt(char * *) C++

gdal200.dll!OGRSpatialReference::importFromWkt(char * *) C++

gdal200.dll!_GDALCreateReprojectionTransformer() C++ gdal200.dll!GDALCloneTransformer(void *) C++ gdal200.dll!_GDALDeserializeTransformer() C++ gdal200.dll!GDALCloneTransformer(void *) C++ gdal200.dll!GDALCloneTransformer(void *) C++ gdal200.dll!GDALCloneTransformer(void *) C++ gdal200.dll!GWKGetFilterRadius(enum GDALResampleAlg) C++ msvcr110.dll!malloc(unsigned int size=5111808) Line 91 C

comment:3 Changed 7 years ago by vindoctor

I reviewed the code change and I question TLS string buffer, but this very well may be due to my lack of knowledge on many of the low level details. First, I don't know what it would take to overrun those 10 buffers/thread, which maybe is whats happening here?

Would something like this be a better and safer design? ( I know you don't use the std::string internally). This would avoid the overhead of the ringbuffer and risk of running out of space, with the only disadvantge of a small heap allocation if your internal string uses the heap instead of a fixed/stack. Just putting this out as maybe a reason exists to really make this huge TLS instead of just returning a copy.

void CPLsetlocale (int category, const char* locale, std::string &outstr )

comment:4 Changed 7 years ago by vindoctor

Sorry. "huge" is suppose to be use. I understand the TLS string buffer may be needed for other things, and I in no means wanted to imply this is a huge buffer.

comment:5 Changed 7 years ago by vindoctor

Just as an FYI, I did try out the above with a hacked in std::string and still crash. Just an FYI its probably not the ring buffer.. which you may have known anyways.

msvcr110.dll!malloc(unsigned int size=875564040) Line 91 C

msvcr110.dll!realloc(void * pBlock=0x34300c08, unsigned int newsize=875564040) Line 85 C gdal200gw3d.dll!CPLRealloc(void * pData=0x34300c08, unsigned int nNewSize=8) Line 203 C++ gdal200gw3d.dll!OGR_SRSNode::Clone(void) C++ gdal200gw3d.dll!OGR_SRSNode::Clone(void) C++ gdal200gw3d.dll!OGR_SRSNode::Clone(void) C++ gdal200gw3d.dll!OGRSpatialReference::Clone(void) C++ gdal200gw3d.dll!OGRProj4CT::InitializeNoLock?(class OGRSpatialReference *,class OGRSpatialReference *) C++ kernel32.dll!755814ad() Unknown msvcr110.dll!free(void * pBlock) Line 51 C ntdll.dll!7769fba1() Unknown KernelBase?.dll!75dc0fb0() Unknown gdal200gw3d.dll!OGRProj4CT::OGRProj4CT(void) C++ gdal200gw3d.dll!_whirlpool_block_mmx

() C++

kernel32.dll!755814ad() Unknown gdal200gw3d.dll!OGRSpatialReference::~OGRSpatialReference(void) C++ gdal200gw3d.dll!GDALCloneTransformer(void *) C++ gdal200gw3d.dll!GWKGetFilterRadius(enum GDALResampleAlg) C++ msvcr110.dll!malloc(unsigned int size=1680633904) Line 91 C

comment:6 Changed 7 years ago by Even Rouault

Yeah, the use of the TLS buffer should be OK because in CPLLocaleC::CPLLocaleC() we just strdup() it right after the return of CPLsetlocale(). There was a potential race here, but a very short one. Obviously you're running into a different problem. The crashes you get seem to suggest a memory corruption somewhere. Is it due to the setlocale() stuff ? Could you disable the use of setlocale() in both GDAL and proj.4 to confirm that's the cause of the issue and you're not running into something else (and if you do so make sure you run with a C locale) ?

comment:7 Changed 7 years ago by vindoctor

I commented out the locale in proj4 and disabled via the setting gdal supports and I do not crash in those places anymore with strings. I have a hunch its something dealing with the init context going into proj4 not being threadsafe somehow. I also see some bad things even commented as bad in gdal when it comes to creating a mutex, in regard that the creation of the mutex to be threadsafe is not threadsafe itself, so I also added some code on my main thread to touch that api before I create my few dozen threads to process the data as well. Previously my main thread only initialized gdal and set some global settings, so I'm sure we were also crashing at times due to how things are currently coded. I'm not sure if we have other APIs I might also need to touch on my main thread first as well before creating threads.

comment:8 Changed 7 years ago by vindoctor

Reading up on the setlocale issue, it seems the only way things work reliably in the real world is to do something like this below for an SDK.

the "_l"-functions, which work using a specific locale like this: _locale_t localeInfo = _create_locale(LC_NUMERIC, "C"); _sprintf_l(string, format, localeInfo, number); _free_locale(localeInfo);

comment:9 Changed 7 years ago by vindoctor

Also, to go from a string to an numerical value is below. Keep in mind we would want to be smart and only creating one _create_locale for the job instead of constantly creating and destroying it. I'm not sure if the locale context needs to be per thread or not since its really a context, but to be safe using TLS might be a safe route to take. Note, MS uses the "_l" at the end of their api's to take the locale context. Proj4 should really do this as well.

_locale_t l = _create_locale(LC_NUMERIC, "C");

double result = _atof_l(str, l);

_free_locale(l);

return result;

comment:10 Changed 7 years ago by vindoctor

Anyone want to comment on this approach? Seems to be the only logical way to go through my eyes.

comment:11 Changed 7 years ago by Even Rouault

Seems reasonable, but should likely be tried. I would be a bit concerned by the performance of getting the locale_t object with TLS. Why not having a static global C++ object that creates that object at GDAL code loading. If you don't modify the object, it should be safe using it from multiple threads, shouldn't it ?

comment:12 Changed 7 years ago by vindoctor

I cannot find anything that says the locale context on windows is not threadsafe, so my guess is a context would be safe to use with different threads IF not being changed to a different locale setting(being familiar with windows APIs and how they normally implement them). My input on implementation is below. Not being as familiar with the gdal codebase as you are, my concerns about not going with the TLS approach may be null and void if gdal and its drivers are not trying to change the locale.

For thread local storage in windows, are you certain using TLS is measurably slow compared to having to use a mutex to share an object across threads, as well as having to reset it back if it was changed? I would expect to see an increase in performance, not a decrease for this use case. I do agree to have a global container, but, if TLS is not used for this container, thread-safe management of this container would be required, and a thread would need to register and unregistered from this container. Thread contention would also exist. This again is assuming gdal and its drivers currently change the locale, and is not a set and forget at startup like you said.

In addition to the above, gdal currently uses a ring buffer in TLS for the strings passed around, I didn't look too closely on this, but would using the TLS for the context remove the need for this ring buffer and its management as well? I do not know the history and reasons for the string ring buffer, so this may not be related to the current locale design in gdal, meaning, the locale strings are just using this ring buffer for some optimization that took place sometime in the past.

So in a nut shell... "If you don't modify the object, it should be safe using it from multiple threads, shouldn't it ?"

If Gdal is not changing the locale in any thread and just needs a global context to the dll itself, from what I'm reading, yes! This would obviously be the easiest way to handle this.

comment:13 Changed 7 years ago by vindoctor

I think the 90% quick fix solution is to just use this on each thread created. It won't work for all situations or older windows operating systems.

_configthreadlocale(_ENABLE_PER_THREAD_LOCALE);

comment:14 Changed 7 years ago by vindoctor

Just for an update on this quick work around, as may testing has yet to crash. It use to crash within 15 seconds. Any thread you create external to gdal that calls into its API, calling _configthreadlocale(_ENABLE_PER_THREAD_LOCALE); on the thread, first thing when the thread is created, is important. I also set this flag in inside gdal. Even if gdal starts using the locale context, the issue would still exist with proj4, so this kinda gets around that too.

static DWORD WINAPI CPLStdCallThreadJacket( void *pData )

{

CPLStdCallThreadInfo *psInfo = (CPLStdCallThreadInfo *) pData;

_configthreadlocale(_ENABLE_PER_THREAD_LOCALE);

psInfo->pfnMain( psInfo->pAppData );

if (psInfo->hThread == NULL)

CPLFree( psInfo ); /* Only for detached threads */

CPLCleanupTLS();

return 0;

}

comment:15 Changed 6 years ago by Even Rouault

Milestone: 2.1.0
Resolution: fixed
Status: newclosed

With latest proj.4 master (with fix for ​https://github.com/OSGeo/proj.4/issues/226) and r29493, setlocale() is no longer called, so this should be solved.

Note: See TracTickets for help on using tickets.