Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#3296 closed enhancement (fixed)

[PATCH] Speed up GDALCopyWords with using a few templates

Reported by: Even Rouault Owned by: warmerdam
Priority: normal Milestone: 1.7.0
Component: GDAL_Raster Version: unspecified
Severity: normal Keywords:
Cc: pvachon

Description

The attached patch comes from Phil Vachon who states that more testing is needed.

Attachments (9)

fix-copywords-performance.patch (27.3 KB) - added by Even Rouault 10 years ago.
fix-copywords-performance-v2.patch (28.9 KB) - added by pvachon 10 years ago.
Sanitized version of the patch -- various cleanups
fix-copywords-performance-v4.patch (29.2 KB) - added by pvachon 10 years ago.
Fix some brain damage in the way templates were being instantiated. Fixes a handful of failing tests (and errors when going float types ->integer types)
fix-copywords-performance-v5.patch (27.5 KB) - added by Even Rouault 10 years ago.
Passes succesfully all the autotest tests (patch against latest trunk r18367)
oldgen.txt (3.3 KB) - added by Even Rouault 10 years ago.
Result of testperfcopywords on GDAL 1.6 branch, compiled with GCC 4.4.0 in -O2 on a Athlon 64 3400+
newgen.txt (3.3 KB) - added by Even Rouault 10 years ago.
Result of testperfcopywords on GDAL trunk, compiled with GCC 4.4.0 in -O2 on a Athlon 64 3400+
perf_legacy.txt (3.2 KB) - added by pvachon 10 years ago.
Original version performance, with -O2.
perf_intree.txt (3.2 KB) - added by pvachon 10 years ago.
Test of code, pre-r18399. Built with -O2.
perf_newest.txt (3.2 KB) - added by pvachon 10 years ago.
The performance of the r18399 changes. Built with -O2.

Download all attachments as: .zip

Change History (19)

Changed 10 years ago by Even Rouault

comment:1 Changed 10 years ago by Even Rouault

Component: defaultGDAL_Raster
Type: defectenhancement

comment:2 Changed 10 years ago by warmerdam

Cc: pvachon added; philv removed

Changed 10 years ago by pvachon

Sanitized version of the patch -- various cleanups

Changed 10 years ago by pvachon

Fix some brain damage in the way templates were being instantiated. Fixes a handful of failing tests (and errors when going float types ->integer types)

Changed 10 years ago by Even Rouault

Passes succesfully all the autotest tests (patch against latest trunk r18367)

comment:3 Changed 10 years ago by pvachon

Resolution: fixed
Status: newclosed

As of r18371 the new implementation is in place and enabled. All the cases seem to work quite well. The only outstanding issue is looking into tightening the real pixel code, if necessary. However, at this time I feel it's worthwhile to put off further work until GDAL 1.7.0 is out the door.

comment:4 Changed 10 years ago by Even Rouault

Resolution: fixed
Status: closedreopened

<EvenR>philv: I haven't tested and I can only do it tonight, but I think your 'nOffset = sizeof(T)' (in r18367) is wrong. It will do something weird. nSrcPixelOffset is supposed to be the spacing between two samples, whatever they are of complex type or not, so I don't see why you need to change it if (bInComplex). Or maybe I just need a cup of coffee to get really awaken

comment:5 Changed 10 years ago by Even Rouault

Milestone: 1.7.0

Just to notify people reading this ticket :

I've commited an autotest/cpp/testcopywords.cpp that tests hopefully all cases of conversions from any type to any type. It passes with old-gen GDALCopyWords (after the fix done in r18374). There are a few cases that fail with the current state of the template based GDALCopyWords, but Phil has a patch in the works.

comment:6 Changed 10 years ago by Even Rouault

Phil,

Could you remove the thrown exceptions in the 'default' cases and just leave the CPLAssert(FALSE). MSVC++ 7.1 seems to complain about thrown exceptions by GDALCopyWords(): "warning C4297: 'GDALCopyWords' : function assumed not to throw an exception but does declspec(nothrow) or throw() was specified on the function". GDALCopyWords() is in the C API, so it does not make sense to throw C++ exceptions

comment:7 Changed 10 years ago by Even Rouault

Latest news :

  • pvachon * r18379 /trunk/gdal/gcore/rasterio.cpp: Constness cleanups; added specializations for float/double case for complex values (thanks to Even for the inspiration).
  • rouault * r18380 /trunk/gdal/gcore/rasterio.cpp: Add extra overloads for the float/double -> integer case for complex->real and complex->complex; All tests of testcopywords now pass

Current situation is that the code has got progressively quite bloated. We've needed to overload certain functions because we get different results when the code is compiled with -00 or -O2 ...

In a nutshell, the root cause is that :

    int i = INT_MAX;
    float f = (float)i;
    int i2 = (int)f;
    printf("%d\n", i2);

prints 2147483647 in -O2, but -2147483648 in -O0 ... because 2147483647.0f == 2147483648.0f ... So we must avoid too much casting...

That's what r18380 did to fix the failures that we got with r18379 in -00 :

Test failed at line 332 (intype=Float32,inval=3000000000.000000,outtype=CInt32,g<EvenR>ot -2147483648.000000 expected 2147483647.000000) !!!
Test failed at line 437 (intype=CFloat32,inval=3000000000.000000,outtype=CInt32,got -2147483648.000000 expected 2147483647.000000) !!!

At last, Phil has confirmed that the performance after r18380 is still much better than the old-gen copywords. So we have at least reached that goal. But readability has much suffered... There was some discussion between Phil and Mateusz how to improve that, but it wasn't obvious (to me) if we could make the code look less ugly without sacrificing performance or complicated again our use of templates...

comment:8 Changed 10 years ago by Even Rouault

r18383 /trunk/gdal/gcore/rasterio.cpp: GDALCopyWords() : fix performance regression in new-gen implementation when converting from float/double to GDT_Byte/GDT_UInt16/GDT_UInt32 : floor is not needed (#3296)

r18384 /trunk/autotest/cpp/testperfcopywords.cpp: Add a test to time GDALCopyWords performance (#3296)

Attached the results of testperfcopywords on GDAL 1.6 branch and GDAL trunk, both compiled with GCC 4.4.0 and with -O2

Changed 10 years ago by Even Rouault

Attachment: oldgen.txt added

Result of testperfcopywords on GDAL 1.6 branch, compiled with GCC 4.4.0 in -O2 on a Athlon 64 3400+

Changed 10 years ago by Even Rouault

Attachment: newgen.txt added

Result of testperfcopywords on GDAL trunk, compiled with GCC 4.4.0 in -O2 on a Athlon 64 3400+

comment:9 Changed 10 years ago by pvachon

Resolution: fixed
Status: reopenedclosed

rasterio.cpp | 749 +++++++++++------------------------------------------------ 1 file changed, 144 insertions(+), 605 deletions(-)

I've restructured GDALCopyords a bit more to get a modest improvement in performance. I've also attached a few benchmarks I've generated locally based on this code. All test cases are passing with -O2 and -O0 on gcc 4.2.1.

See r18399.

Changed 10 years ago by pvachon

Attachment: perf_legacy.txt added

Original version performance, with -O2.

Changed 10 years ago by pvachon

Attachment: perf_intree.txt added

Test of code, pre-r18399. Built with -O2.

Changed 10 years ago by pvachon

Attachment: perf_newest.txt added

The performance of the r18399 changes. Built with -O2.

comment:10 Changed 10 years ago by pvachon

Even made a change that revealed a bug when pixel values are less than 0.0, the input type is floating point and the output type is integer. This case is fixed in r18401.

Note: See TracTickets for help on using tickets.