#5272 closed enhancement (fixed)
Speed improvement for OGR shapefile provider
Reported by: | ahuarte47 | Owned by: | ahuarte47 |
---|---|---|---|
Priority: | normal | Milestone: | 1.11.0 |
Component: | OGR_SF | Version: | unspecified |
Severity: | normal | Keywords: | |
Cc: |
Description (last modified by )
It is possible improve the shapefile reader avoiding unnecessary memcpy's and calloc's and replacing the CPLAtof by super-fast OGRFastAtof.
OGRFastAtof gets significant improvement at least in OS windows.
Attachments (1)
Change History (21)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
Description: | modified (diff) |
---|
comment:3 by , 10 years ago
I made tests with 'test_ogrsf.exe' and a dbase file.
Soft/hard: WindowsXP+SP3, 32bits, 4gb RAM. Dataset: 'parcela_urbana.dbf' with 152044 records and a double field (Attached).
Tests | Loops | Trunk (total) | Trunk (avg) | Patched (total) | Patched (avg) | % diff |
All tests | 20 | 77,397000 | 3,869850 | 64,904000 | 3,245200 | 84% |
TestOGRLayerFeatureCount | 300 | 125,933000 | 0,419777 | 99,794000 | 0,332647 | 79% |
Regards
comment:5 by , 10 years ago
I made more tests with 'test_ogrsf.exe' (Only executing TestOGRLayerFeatureCount) for some shapefiles.
Datasets | Loops | Trunk (total) | Trunk (avg) | Patched_1 (total) | Patched_1 (avg) | % diff_1 | Patched_2 (total) | Patched_2 (avg) | % diff_2 | % diff |
http://faunalia.pt/downloads/GTBs_Shapefile.zip | 20 | 1568,72000 | 78,43600 | 1564,26000 | 78,21300 | 100% | 1564,22000 | 78,21100 | 100% | 100% |
http://faunalia.pt/downloads/MAMMTERR.zip | 20 | 139,43300 | 6,97165 | 134,66900 | 6,73345 | 97% | 133,32100 | 6,66605 | 99% | 96% |
http://www.igeo.pt/scrif/crif/CRIF2011shp.zip | 20 | 590,02000 | 29,50100 | 555,12000 | 27,75600 | 94% | 507,16000 | 25,35800 | 91% | 86% |
http://idena.navarra.es/descargas/CARTO1_Lin_6CNivelD.zip | 20 | 19,49700 | 0,97485 | 19,18500 | 0,95925 | 98% | 16,72700 | 0,83635 | 87% | 86% |
http://idena.navarra.es/descargas/GEOLOG_Pol_Litologia.zip | 20 | 17,13900 | 0,85695 | 16,38200 | 0,81910 | 96% | 15,96200 | 0,79810 | 97% | 93% |
Urban parcels of cadastre of navarra (Polygon2D of 142680 records, 66mb) | 20 | 40,57800 | 2,02890 | 38,75000 | 1,93750 | 95% | 32,33600 | 1,61680 | 83% | 80% |
Rustic parcels of cadastre of navarra (Polygon2D of 542658 records, 458mb) | 20 | 121,33800 | 6,06690 | 115,38600 | 5,76930 | 95% | 96,90200 | 4,84510 | 84% | 80% |
Patch_1=Avoid unnecesary callocs for ZM coordinates in SHPObject.
Patch_2=Replace CPLAtof for CPLFastAtof.
Soft/hard: WindowsXP+SP3, 32bits, 4gb RAM
The improvement is more noticeable for large datasets with big linetrings or polygons.
comment:6 by , 10 years ago
New related pull request to avoid unnecesary memory allocations in OGR shapefile provider.
comment:7 by , 10 years ago
comment:8 by , 10 years ago
comment:9 by , 10 years ago
Type: | defect → enhancement |
---|
follow-up: 13 comment:11 by , 10 years ago
Hi Even, thank you very much for your patience!
I see your pull request it is perfect for me (I gets same speed imporvement even bit more), the idea was use a shared buffer for geometry as a shared buffer is used in the dbf manager.
QGIS users will thank: http://hub.qgis.org/issues/8725#note-95
Also I see your improvement parsing the integer attributes!
Even, but I think that there is a possible bit memory leak if the developper call two times to the 'SHPSetFastModeReadObject' function.
If you like I propose some this....
/* In this mode, the content of SHPReadObject() are owned by the SHPHandle. */ /* So you cannot have 2 valid instances of SHPReadObject() simultaneously. */ /* The SHPObject padfZ and padfM members may be NULL depending on the geometry */ /* type. It is illegal to free at hand any of the pointer members of the SHPObject structure */ void SHPAPI_CALL SHPSetFastModeReadObject( SHPHandle hSHP, int bFastMode ) {
if( hSHP->bFastModeReadObject != bFastMode ) {
if( bFastMode ) {
hSHP->bFastModeReadObject = TRUE; hSHP->psCachedObject = (SHPObject*) calloc(1, sizeof(SHPObject));
} else {
hSHP->bFastModeReadObject = FALSE;
if( hSHP->pabyObjectBuf != NULL ) {
free( hSHP->pabyObjectBuf );
} if( hSHP->psCachedObject != NULL ) {
free( hSHP->psCachedObject );
}
}
}
}
It avoids the memory leak and the developer can disable the 'fastmode'
Now, I am going to test filemapping using boost and send results Best regards
comment:12 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:13 by , 10 years ago
Replying to ahuarte47:
Hi Even, thank you very much for your patience!
I see your pull request it is perfect for me (I gets same speed imporvement even bit more), the idea was use a shared buffer for geometry as a shared buffer is used in the dbf manager.
QGIS users will thank: http://hub.qgis.org/issues/8725#note-95
Also I see your improvement parsing the integer attributes!
Even, but I think that there is a possible bit memory leak if the developper call two times to the 'SHPSetFastModeReadObject' function.
If you like I propose some this....
/* In this mode, the content of SHPReadObject() are owned by the SHPHandle. */ /* So you cannot have 2 valid instances of SHPReadObject() simultaneously. */ /* The SHPObject padfZ and padfM members may be NULL depending on the geometry */ /* type. It is illegal to free at hand any of the pointer members of the SHPObject structure */ void SHPAPI_CALL SHPSetFastModeReadObject( SHPHandle hSHP, int bFastMode ) {
if( hSHP->bFastModeReadObject != bFastMode ) {
if( bFastMode ) {
hSHP->bFastModeReadObject = TRUE; hSHP->psCachedObject = (SHPObject*) calloc(1, sizeof(SHPObject));
} else {
hSHP->bFastModeReadObject = FALSE;
if( hSHP->pabyObjectBuf != NULL ) {
free( hSHP->pabyObjectBuf );
} if( hSHP->psCachedObject != NULL ) {
free( hSHP->psCachedObject );
}
}
}
}
It avoids the memory leak and the developer can disable the 'fastmode'
Now, I am going to test filemapping using boost and send results Best regards
Better use:
/* In this mode, the content of SHPReadObject() are owned by the SHPHandle. */ /* So you cannot have 2 valid instances of SHPReadObject() simultaneously. */ /* The SHPObject padfZ and padfM members may be NULL depending on the geometry */ /* type. It is illegal to free at hand any of the pointer members of the SHPObject structure */ void SHPAPI_CALL SHPSetFastModeReadObject( SHPHandle hSHP, int bFastMode ) {
if( hSHP->bFastModeReadObject != bFastMode ) {
if( bFastMode ) {
hSHP->bFastModeReadObject = TRUE; hSHP->psCachedObject = (SHPObject*) calloc(1, sizeof(SHPObject));
} else {
hSHP->bFastModeReadObject = FALSE;
if( hSHP->pabyObjectBuf != NULL ) {
free( hSHP->pabyObjectBuf ); hSHP->pabyObjectBuf = NULL;
} if( hSHP->psCachedObject != NULL ) {
free( hSHP->psCachedObject ); hSHP->psCachedObject = NULL;
} hSHP->nObjectBufSize = 0;
}
}
}
comment:14 by , 10 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I'll integrate your fix for SHPSetFastModeReadObject().
Regarding memory mapped file, there's one point that I'm not sure how you can deal with : the fact that on 32 bit OS, the virtual memory space for a process is small ( 2 GB ). So if you memory mapped a whole file, you can run out of virtual memory very quickly (or even worse if you want to simultaneously deal with files whose total size is above 2 GB). So it can be hard to use the capability in a robust way on 32 bit. On 64 bit, you have no such problem.
follow-up: 16 comment:15 by , 10 years ago
Hi Even, I must study as boost library deal this. All comments are welcome!
I want implement (For validate if MMF can be good for us), a new 'VSIVirtualHandle' class for activate it in 'VSIWin32FilesystemHandler::Open()' using a setting similar to 'CPLGetConfigOption("VSI_CACHE", "FALSE")'.
It looks good?
Sorry, I can not test in no-windows OS's.
I have other question, the 'SHPSetFastModeReadObject' function in that version of GDAL will be released ?
QGIS 2.0 uses GDAL 1.10, I don't know as the issues are flushed in the available versions of GDAL.
Thanks!
follow-up: 17 comment:16 by , 10 years ago
Replying to ahuarte47:
Hi Even, I must study as boost library deal this. All comments are welcome!
Make sure that the new class can be conditionnaly compiled, as Boost is not a dependency of GDAL, and we want to avoid to add compulsory dependencies.
I want implement (For validate if MMF can be good for us), a new 'VSIVirtualHandle' class for activate it in 'VSIWin32FilesystemHandler::Open()' using a setting similar to 'CPLGetConfigOption("VSI_CACHE", "FALSE")'.
It looks good?
Yes
Sorry, I can not test in no-windows OS's.
I sympathize to the fact that you must work with Windows ;-)
I have other question, the 'SHPSetFastModeReadObject' function in that version of GDAL will be released ?
The next major version of GDAL. 1.11 or 2.0 depending on how we call it. This is a new feature, with some risk of breakage to risk to put in the 1.10 branch.
QGIS 2.0 uses GDAL 1.10, I don't know as the issues are flushed in the available versions of GDAL.
Thanks!
comment:17 by , 10 years ago
Replying to rouault:
Replying to ahuarte47:
Hi Even, I must study as boost library deal this. All comments are welcome!
Make sure that the new class can be conditionnaly compiled, as Boost is not a dependency of GDAL, and we want to avoid to add compulsory dependencies.
I had that in mind thanks, if this goes ahead, I thought adding to OSGeo4W as new package.
I want implement (For validate if MMF can be good for us), a new 'VSIVirtualHandle' class for activate it in 'VSIWin32FilesystemHandler::Open()' using a setting similar to 'CPLGetConfigOption("VSI_CACHE", "FALSE")'.
It looks good?
Yes
Sorry, I can not test in no-windows OS's.
I sympathize to the fact that you must work with Windows ;-)
I have other question, the 'SHPSetFastModeReadObject' function in that version of GDAL will be released ?
The next major version of GDAL. 1.11 or 2.0 depending on how we call it. This is a new feature, with some risk of breakage to risk to put in the 1.10 branch.
QGIS 2.0 uses GDAL 1.10, I don't know as the issues are flushed in the available versions of GDAL.
Thanks!
comment:18 by , 10 years ago
Milestone: | → 2.0 |
---|---|
Resolution: | → fixed |
Status: | reopened → closed |
comment:19 by , 10 years ago
Hi Even, I added for test a new VSIVirtualHandle class using memory-mapped-file.
The results do not have significant improvement on VSICachedFile, I think it's not worth adding this new implementation.
Best regards
comment:20 by , 10 years ago
Milestone: | 2.0 → 1.11.0 |
---|
Done:
https://github.com/ahuarte47/gdal/commit/f24d4d39c4fd2989c1decfd08d5e873d3a3967f7