Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#5272 closed enhancement (fixed)

Speed improvement for OGR shapefile provider

Reported by: ahuarte47 Owned by: ahuarte47
Priority: normal Milestone: 1.11.0
Component: OGR_SF Version: unspecified
Severity: normal Keywords:
Cc:

Description (last modified by ahuarte47)

It is possible improve the shapefile reader avoiding unnecessary memcpy's and calloc's and replacing the CPLAtof by super-fast OGRFastAtof.

OGRFastAtof gets significant improvement at least in OS windows.

Attachments (1)

parcela_urbana.zip (711.0 KB ) - added by ahuarte47 10 years ago.
dbase file for test

Download all attachments as: .zip

Change History (21)

comment:2 by ahuarte47, 10 years ago

Description: modified (diff)

by ahuarte47, 10 years ago

Attachment: parcela_urbana.zip added

dbase file for test

comment:3 by ahuarte47, 10 years ago

I made tests with 'test_ogrsf.exe' and a dbase file.

Soft/hard: WindowsXP+SP3, 32bits, 4gb RAM. Dataset: 'parcela_urbana.dbf' with 152044 records and a double field (Attached).

Tests Loops Trunk (total) Trunk (avg) Patched (total) Patched (avg) % diff
All tests 20 77,397000 3,869850 64,904000 3,245200 84%
TestOGRLayerFeatureCount 300 125,933000 0,419777 99,794000 0,332647 79%

Regards

comment:5 by ahuarte47, 10 years ago

I made more tests with 'test_ogrsf.exe' (Only executing TestOGRLayerFeatureCount) for some shapefiles.

Datasets Loops Trunk (total) Trunk (avg) Patched_1 (total) Patched_1 (avg) % diff_1 Patched_2 (total) Patched_2 (avg) % diff_2 % diff
http://faunalia.pt/downloads/GTBs_Shapefile.zip 20 1568,72000 78,43600 1564,26000 78,21300 100% 1564,22000 78,21100 100% 100%
http://faunalia.pt/downloads/MAMMTERR.zip 20 139,43300 6,97165 134,66900 6,73345 97% 133,32100 6,66605 99% 96%
http://www.igeo.pt/scrif/crif/CRIF2011shp.zip 20 590,02000 29,50100 555,12000 27,75600 94% 507,16000 25,35800 91% 86%
http://idena.navarra.es/descargas/CARTO1_Lin_6CNivelD.zip 20 19,49700 0,97485 19,18500 0,95925 98% 16,72700 0,83635 87% 86%
http://idena.navarra.es/descargas/GEOLOG_Pol_Litologia.zip 20 17,13900 0,85695 16,38200 0,81910 96% 15,96200 0,79810 97% 93%
Urban parcels of cadastre of navarra (Polygon2D of 142680 records, 66mb) 20 40,57800 2,02890 38,75000 1,93750 95% 32,33600 1,61680 83% 80%
Rustic parcels of cadastre of navarra (Polygon2D of 542658 records, 458mb) 20 121,33800 6,06690 115,38600 5,76930 95% 96,90200 4,84510 84% 80%

Patch_1=Avoid unnecesary callocs for ZM coordinates in SHPObject.

Patch_2=Replace CPLAtof for CPLFastAtof.

Soft/hard: WindowsXP+SP3, 32bits, 4gb RAM

The improvement is more noticeable for large datasets with big linetrings or polygons.

comment:6 by ahuarte47, 10 years ago

New related pull request to avoid unnecesary memory allocations in OGR shapefile provider.

https://github.com/OSGeo/gdal/pull/26

comment:7 by Even Rouault, 10 years ago

r26622 "Shapelib: add SHPSetFastModeReadObject() to avoid calloc() in SHPReadObject() (but only one SHPObject valid at a time) (derived from patch by Alvaro Huarte, #5272)" r26623 "Shapefile driver: use SHPSetFastModeReadObject() for faster shape retrieval (#5272)"

comment:8 by Even Rouault, 10 years ago

r26624 "Shapefile: fix RECOMPUTE EXTENT on 2D linestring / polygon (trunk only, complementary fix for r26622 and r26623, #5272)"

comment:9 by Even Rouault, 10 years ago

Type: defectenhancement

r26625 "Shapelib: avoid using atof() for integer fields (#5272)"

comment:10 by Even Rouault, 10 years ago

r26630 "Shapelib: make bBigEndian a #define in GDAL context (#5272)"

comment:11 by ahuarte47, 10 years ago

Hi Even, thank you very much for your patience!

I see your pull request it is perfect for me (I gets same speed imporvement even bit more), the idea was use a shared buffer for geometry as a shared buffer is used in the dbf manager.

QGIS users will thank: http://hub.qgis.org/issues/8725#note-95

Also I see your improvement parsing the integer attributes!

Even, but I think that there is a possible bit memory leak if the developper call two times to the 'SHPSetFastModeReadObject' function.

If you like I propose some this....

/* In this mode, the content of SHPReadObject() are owned by the SHPHandle. */ /* So you cannot have 2 valid instances of SHPReadObject() simultaneously. */ /* The SHPObject padfZ and padfM members may be NULL depending on the geometry */ /* type. It is illegal to free at hand any of the pointer members of the SHPObject structure */ void SHPAPI_CALL SHPSetFastModeReadObject( SHPHandle hSHP, int bFastMode ) {

if( hSHP->bFastModeReadObject != bFastMode ) {

if( bFastMode ) {

hSHP->bFastModeReadObject = TRUE; hSHP->psCachedObject = (SHPObject*) calloc(1, sizeof(SHPObject));

} else {

hSHP->bFastModeReadObject = FALSE;

if( hSHP->pabyObjectBuf != NULL ) {

free( hSHP->pabyObjectBuf );

} if( hSHP->psCachedObject != NULL ) {

free( hSHP->psCachedObject );

}

}

}

}

It avoids the memory leak and the developer can disable the 'fastmode'

Now, I am going to test filemapping using boost and send results Best regards

comment:12 by ahuarte47, 10 years ago

Resolution: fixed
Status: newclosed

in reply to:  11 comment:13 by ahuarte47, 10 years ago

Replying to ahuarte47:

Hi Even, thank you very much for your patience!

I see your pull request it is perfect for me (I gets same speed imporvement even bit more), the idea was use a shared buffer for geometry as a shared buffer is used in the dbf manager.

QGIS users will thank: http://hub.qgis.org/issues/8725#note-95

Also I see your improvement parsing the integer attributes!

Even, but I think that there is a possible bit memory leak if the developper call two times to the 'SHPSetFastModeReadObject' function.

If you like I propose some this....

/* In this mode, the content of SHPReadObject() are owned by the SHPHandle. */ /* So you cannot have 2 valid instances of SHPReadObject() simultaneously. */ /* The SHPObject padfZ and padfM members may be NULL depending on the geometry */ /* type. It is illegal to free at hand any of the pointer members of the SHPObject structure */ void SHPAPI_CALL SHPSetFastModeReadObject( SHPHandle hSHP, int bFastMode ) {

if( hSHP->bFastModeReadObject != bFastMode ) {

if( bFastMode ) {

hSHP->bFastModeReadObject = TRUE; hSHP->psCachedObject = (SHPObject*) calloc(1, sizeof(SHPObject));

} else {

hSHP->bFastModeReadObject = FALSE;

if( hSHP->pabyObjectBuf != NULL ) {

free( hSHP->pabyObjectBuf );

} if( hSHP->psCachedObject != NULL ) {

free( hSHP->psCachedObject );

}

}

}

}

It avoids the memory leak and the developer can disable the 'fastmode'

Now, I am going to test filemapping using boost and send results Best regards

Better use:

/* In this mode, the content of SHPReadObject() are owned by the SHPHandle. */ /* So you cannot have 2 valid instances of SHPReadObject() simultaneously. */ /* The SHPObject padfZ and padfM members may be NULL depending on the geometry */ /* type. It is illegal to free at hand any of the pointer members of the SHPObject structure */ void SHPAPI_CALL SHPSetFastModeReadObject( SHPHandle hSHP, int bFastMode ) {

if( hSHP->bFastModeReadObject != bFastMode ) {

if( bFastMode ) {

hSHP->bFastModeReadObject = TRUE; hSHP->psCachedObject = (SHPObject*) calloc(1, sizeof(SHPObject));

} else {

hSHP->bFastModeReadObject = FALSE;

if( hSHP->pabyObjectBuf != NULL ) {

free( hSHP->pabyObjectBuf ); hSHP->pabyObjectBuf = NULL;

} if( hSHP->psCachedObject != NULL ) {

free( hSHP->psCachedObject ); hSHP->psCachedObject = NULL;

} hSHP->nObjectBufSize = 0;

}

}

}

comment:14 by Even Rouault, 10 years ago

Resolution: fixed
Status: closedreopened

I'll integrate your fix for SHPSetFastModeReadObject().

Regarding memory mapped file, there's one point that I'm not sure how you can deal with : the fact that on 32 bit OS, the virtual memory space for a process is small ( 2 GB ). So if you memory mapped a whole file, you can run out of virtual memory very quickly (or even worse if you want to simultaneously deal with files whose total size is above 2 GB). So it can be hard to use the capability in a robust way on 32 bit. On 64 bit, you have no such problem.

comment:15 by ahuarte47, 10 years ago

Hi Even, I must study as boost library deal this. All comments are welcome!

I want implement (For validate if MMF can be good for us), a new 'VSIVirtualHandle' class for activate it in 'VSIWin32FilesystemHandler::Open()' using a setting similar to 'CPLGetConfigOption("VSI_CACHE", "FALSE")'.

It looks good?

Sorry, I can not test in no-windows OS's.

I have other question, the 'SHPSetFastModeReadObject' function in that version of GDAL will be released ?

QGIS 2.0 uses GDAL 1.10, I don't know as the issues are flushed in the available versions of GDAL.

Thanks!

in reply to:  15 ; comment:16 by Even Rouault, 10 years ago

Replying to ahuarte47:

Hi Even, I must study as boost library deal this. All comments are welcome!

Make sure that the new class can be conditionnaly compiled, as Boost is not a dependency of GDAL, and we want to avoid to add compulsory dependencies.

I want implement (For validate if MMF can be good for us), a new 'VSIVirtualHandle' class for activate it in 'VSIWin32FilesystemHandler::Open()' using a setting similar to 'CPLGetConfigOption("VSI_CACHE", "FALSE")'.

It looks good?

Yes

Sorry, I can not test in no-windows OS's.

I sympathize to the fact that you must work with Windows ;-)

I have other question, the 'SHPSetFastModeReadObject' function in that version of GDAL will be released ?

The next major version of GDAL. 1.11 or 2.0 depending on how we call it. This is a new feature, with some risk of breakage to risk to put in the 1.10 branch.

QGIS 2.0 uses GDAL 1.10, I don't know as the issues are flushed in the available versions of GDAL.

Thanks!

in reply to:  16 comment:17 by ahuarte47, 10 years ago

Replying to rouault:

Replying to ahuarte47:

Hi Even, I must study as boost library deal this. All comments are welcome!

Make sure that the new class can be conditionnaly compiled, as Boost is not a dependency of GDAL, and we want to avoid to add compulsory dependencies.

I had that in mind thanks, if this goes ahead, I thought adding to OSGeo4W as new package.

I want implement (For validate if MMF can be good for us), a new 'VSIVirtualHandle' class for activate it in 'VSIWin32FilesystemHandler::Open()' using a setting similar to 'CPLGetConfigOption("VSI_CACHE", "FALSE")'.

It looks good?

Yes

Sorry, I can not test in no-windows OS's.

I sympathize to the fact that you must work with Windows ;-)

I have other question, the 'SHPSetFastModeReadObject' function in that version of GDAL will be released ?

The next major version of GDAL. 1.11 or 2.0 depending on how we call it. This is a new feature, with some risk of breakage to risk to put in the 1.10 branch.

QGIS 2.0 uses GDAL 1.10, I don't know as the issues are flushed in the available versions of GDAL.

Thanks!

comment:18 by Even Rouault, 10 years ago

Milestone: 2.0
Resolution: fixed
Status: reopenedclosed

r26634 "Shape: add a parameter to SHPSetFastModeReadObject (derived from patch by Alvaro Huarte, #5272)"

comment:19 by ahuarte47, 10 years ago

Hi Even, I added for test a new VSIVirtualHandle class using memory-mapped-file.

The results do not have significant improvement on VSICachedFile, I think it's not worth adding this new implementation.

Best regards

comment:20 by Even Rouault, 10 years ago

Milestone: 2.01.11.0
Note: See TracTickets for help on using tickets.