#6163 closed defect (fixed)

Mulithreading with several datasets being written - deadlock

Reported by: aghariani Owned by: warmerdam
Priority: normal Milestone: 2.0.2
Component: GDAL_Raster Version: svn-trunk
Severity: normal Keywords: multithreading write
Cc:

Description

Since roughly April, the trunk version of gdal has a multithreading issue. If you read/write two or more files at the same time, it often goes into a deadlock. I wrote a simple program to reproduce the problem. The issue is showing on both linux and windows though the frequency varies (80% of the time under windows and 5% under linux on my simple test).

The longer the read/write is and more likely the issue will appear. But most of the time it's blocked right after the start (but it can still appears in the middle of the execution).

Using the stable version of gdal or the trunk dated from March or earlier, we do not encounter this issue at all.

==================================

#include "cpl_conv.h"

#include "gdal_priv.h"

#include <iostream>

#include <thread>

using namespace std;

int Progress(double dfComplete, const char *pszMessage, void *pData) {

fprintf(stdout, "%d%% complete.\n", (int) (dfComplete * 100)); return TRUE;

}

void task(const char* pszSrcFilename, const char* pszDstFilename) {

const char *pszFormat = "GTiff"; GDALDriver *poDriver; poDriver = GetGDALDriverManager()->GetDriverByName?(pszFormat);

GDALDataset *poSrcDS = (GDALDataset *) GDALOpen(pszSrcFilename, GA_ReadOnly); GDALDataset *poDstDS; poDstDS = poDriver->CreateCopy?(pszDstFilename, poSrcDS, FALSE, NULL, Progress, NULL);

if (poDstDS != NULL)

GDALClose((GDALDatasetH) poDstDS);

GDALClose((GDALDatasetH) poSrcDS);

}

int main() {

GDALAllRegister();

thread t1(task, "SourceFile1.tif", "OutputFile1.tif");

thread t2(task, "SourceFile2.tif", "OutputFile2.tif");

t1.join(); t2.join();

cout << "Tasks Done\n";

}

Attachments (1)

stack.txt (342.1 KB) - added by aghariani 20 months ago.
Stack overflow, tiff driver init race condition

Download all attachments as: .zip

Change History (9)

comment:1 Changed 20 months ago by Even Rouault

Keywords: multithreading write added
Milestone: 2.0.2

Thanks for the report and the test to reproduce it. Actually I could reproduce it 100% on Linux...

trunk r31094 "Make private member of GDALDataset really opaque so as to be able to extend it without breaking the C++ ABI (preliminary for #6163)"

trunk r31095 "Avoid deadlock when writing 2 datasets in 2 threads (#6163)"

This also affects 2.0 so should be backported if everything goes fine in trunk.

comment:2 Changed 20 months ago by Even Rouault

aghariani, can you confirm that the above commits fix your issues ?

comment:3 in reply to:  2 Changed 20 months ago by aghariani

Replying to rouault:

aghariani, can you confirm that the above commits fix your issues ?

Unfortunately it does not fix my issues. Though the behavior might be slightly different. If it doesn't deadlock right at the start, then copy proceeds until the end. (it used to deadlock in the middle of the copy too). So it seems there is still a deadlock between the 2 "open" call.

comment:4 Changed 20 months ago by Even Rouault

Hum, it would help if you could attach a debugger when this happens and display the stack traces of each thread otherwise I'm not sure what to do. A debug version of GDAL will likely be needed for that to get meaningful traces

comment:5 Changed 20 months ago by aghariani

In debug mode, it crashes all the time. I have a stack overflow in XTIFFDefaultDirectory(tiff * tif).

Though if I write one file first in the main thread before starting the two threads, it works fine, no more crash. There seems to be a race condition at the initialization of the tiff driver.

See modification of the main function :

======================

int main() {

GDALAllRegister();

task("SourceFile1.tif", "OutputFile3.tif")

thread t1(task, "SourceFile1.tif", "OutputFile1.tif");

thread t2(task, "SourceFile2.tif", "OutputFile2.tif");

t1.join(); t2.join();

cout << "Tasks Done\n";

}

Last edited 20 months ago by aghariani (previous) (diff)

Changed 20 months ago by aghariani

Attachment: stack.txt added

Stack overflow, tiff driver init race condition

comment:6 Changed 20 months ago by Even Rouault

trunk r31100 "GTiff: call XTIFFInitialize() in LibgeotiffOneTimeInit?() as the former isn't thread-safe, so better call it from the later which is thread-safe (#6163)"

comment:7 Changed 20 months ago by aghariani

It seems to work fine now (at least for the test program). Thanks for the fix!

comment:8 Changed 19 months ago by Even Rouault

Component: defaultGDAL_Raster
Resolution: fixed
Status: newclosed
Summary: Mulithreading - deadlockMulithreading with several datasets being written - deadlock

branches/2.0 r31108 "Make private member of GDALDataset really opaque so as to be able to extend it without breaking the C++ ABI (preliminary for #6163)"

branches/2.0 r31109 "Avoid deadlock when writing 2 datasets in 2 threads (#6163)"

branches/2.0 r31110 "GTiff: call XTIFFInitialize() in LibgeotiffOneTimeInit?() as the former isn't thread-safe, so better call it from the later which is thread-safe (#6163)"

No test added as I couldn't come up with something that would have reliably reproduced the issue.

Note: See TracTickets for help on using tickets.