Opened 13 years ago

Closed 13 years ago

#1594 closed defect (fixed)

Memory allocation error during SPATIAL INDEX creation for Shapefile

Reported by: Mateusz Łoskot Owned by: Mateusz Łoskot
Priority: normal Milestone: 1.4.3
Component: OGR_SF Version: unspecified
Severity: normal Keywords: index shapefile spatial
Cc: ftrastour, warmerdam

Description (last modified by Mateusz Łoskot)

Yesterday, user with nickname Kredik reported following problem with creating spatial index for his Shapefile. Here is the story:


I am trying to index a point shapefile. I use:

ogrinfo -sql "CREATE SPATIAL INDEX ON temp" temp.shp

ogrinfo uses more than 1.5Go of VM and the process failed with a memory allocation error. If i use shptree temp.shp the index creation is done in less than 15 seconds...

The shptree.exe binary is taken from fwtools 1.1.4 (i can't find the source of this application)

It's a point shapefile, 1136000 features (PointZ)

I have tried to specify the depth in the CREATE SPATIAL INDEX statement. I think the default depth used is 19. I have tried with 8, 7,... but the memory footprint is always very large.


Kredik, sent me his file and I confirm the problem occurs. I also tested index creation with other big files I have (ie. from Mass GIS) and here everything works. I suppose the problem is with Kredik's file (he is going to confirm if the file is valid or not).

Although, Frank suggests to investigate the problem with Kredik's file, so we will know what's the reason of the problem.

Attachments (3)

ogrinfo-winxp-test-1.png (95.9 KB) - added by Mateusz Łoskot 13 years ago.
First, spatial index generation test on Windows (note amount of VM used)
ogrinfo-winxp-test-2.png (104.0 KB) - added by Mateusz Łoskot 13 years ago.
Second spatial index test on Windows after added defensive code to shptree.c. Again, compare amount of VM used with VM in test No 1
shptree-depth-size-chart.png (11.4 KB) - added by Mateusz Łoskot 13 years ago.
Chart presenting how DEPTH value influences size of .qix file. This chart was generated using Kredik's test dataset (ptsz.shp, 33MB)

Download all attachments as: .zip

Change History (14)

comment:1 Changed 13 years ago by Mateusz Łoskot

Description: modified (diff)

comment:2 Changed 13 years ago by warmerdam

Cc: ftrastour warmerdam added
Owner: changed from warmerdam to Mateusz Łoskot

comment:3 Changed 13 years ago by warmerdam

Milestone: 1.4.3

Target to fix this for 1.4.3 ...

I don't know that I still have the data for this bug, so hopefully this won't prove too hard to reproduce.

comment:4 Changed 13 years ago by Mateusz Łoskot

Status: newassigned

Unfortunately, I've lost Kredik's test file. I tried to reproduce this problem using other big files but without luck. Kredik or Frank, do you still have it on your disk? Could you send it to me?

I suppose, this problem might be related to 3D geometries, perhaps similar to #1790

comment:5 Changed 13 years ago by Mateusz Łoskot

Trying to reproduce this issue:

~/dev/gdal/bugs/1594 $ ogrinfo -sql "CREATE SPATIAL INDEX ON ptsz" ptsz.shp 
-bash: ogrinfo: command not found
~/dev/gdal/bugs/1594 $ ~/dev/gdal/_svn/trunk/gdal/apps/ogrinfo -sql "CREATE SPATIAL INDEX ON ptsz" ptsz.shp 
INFO: Open of `ptsz.shp'
      using driver `ESRI Shapefile' successful.
ogrinfo(9627) malloc: *** vm_allocate(size=1069056) failed (error code=3)
ogrinfo(9627) malloc: *** error: can't allocate region
ogrinfo(9627) malloc: *** set a breakpoint in szone_error to debug
Bus error

I'm using Mac Pro, 2x2.66 GHz Intel Xeon (4 cores) + 5GB RAM.

~/dev/gdal/bugs/1594 $ ulimit -a
core file size        (blocks, -c) 0
data seg size         (kbytes, -d) 6144
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) unlimited
max memory size       (kbytes, -m) unlimited
open files                    (-n) 256
pipe size          (512 bytes, -p) 1
stack size            (kbytes, -s) 8192
cpu time             (seconds, -t) unlimited
max user processes            (-u) 266
virtual memory        (kbytes, -v) unlimited

comment:6 Changed 13 years ago by Mateusz Łoskot

Analysing vm_allocate(size=1069056) failed (error code=3) message:

  • code 3 means KERN_NO_SPACE
  • KERN_NO_SPACE is defined in /usr/include/mach/kern_return.h (Mac OS X 10.4)
    #define KERN_NO_SPACE 3
       /* The address range specified is already in use, or
        * no address range of the size specified could be
        * found.
        */
    

comment:7 Changed 13 years ago by Mateusz Łoskot

The two last comments above apply to tests made using Kredik's dataset:

~/dev/gdal/bugs/1594 $ ls -lh
total 109344
-rw-rw-rw-   1 mloskot  mloskot       14M Oct 24 18:27 ptsz.dbf
-rw-rw-rw-   1 mloskot  mloskot       32M Oct 24 18:34 ptsz.shp
-rw-rw-rw-   1 mloskot  mloskot        7M Oct 24 18:34 ptsz.shx

Changed 13 years ago by Mateusz Łoskot

Attachment: ogrinfo-winxp-test-1.png added

First, spatial index generation test on Windows (note amount of VM used)

Changed 13 years ago by Mateusz Łoskot

Attachment: ogrinfo-winxp-test-2.png added

Second spatial index test on Windows after added defensive code to shptree.c. Again, compare amount of VM used with VM in test No 1

comment:8 Changed 13 years ago by Mateusz Łoskot

Similarly to the tests I run under Windows, ogrinfo under Mac OS X every time fails near VM usage of 1.8 GB.

comment:9 Changed 13 years ago by Mateusz Łoskot

The problem seems to be identified. All tests were made using quite big (33 MB) shapefile I got from Kredik:

D:\dev\gdal\bugs\1594>%GDAL%\apps\ogrinfo -so ptsz.shp ptsz
OGR: OGROpen(ptsz.shp/003B57D8) succeeded as ESRI Shapefile.
INFO: Open of `ptsz.shp'
      using driver `ESRI Shapefile' successful.
OGR: GetLayerCount() = 1

Layer name: ptsz
Geometry: 3D Point
Feature Count: 932870
Extent: (440001.000000, 5652001.000000) - (441999.000000, 5653999.000000)
Layer SRS WKT:
(unknown)
X: Integer (6.0)
Y: Integer (7.0)
Z: Integer (2.0)

The indexing algorithm, if not requested differently, calculates number of tree levels automatically and for the ptsz.shp file it's calculated as 17-18 levels. This number of levels requires a lot of allocations of memory what causes the memory failure.

There are two solutions possible, first one does not need any changes in the ode, second one does:

  1. Avoid automatic estimation of tree levels by specifying it manually this way:
    CREATE SPATIAL INDEX ON mylayer DEPTH 8
    
  1. Use max level limit as harcoded value in the shptree algorithm, for example 8, 10 or 12 levels.

Probably, we will also fix this issue following the second solution, to avoid similar problems in future.

Users can try to estimate max level of tree nodes by trying to create spatial index a few times using different values and observing if it succeeds or not.

For example, on my Mac OS, I found that I can generate spatial index for Kredik's shapefile with level 16:

D:\dev\gdal\bugs\1594>%GDAL%\apps\ogrinfo -sql "CREATE SPATIAL INDEX ON ptsz DEPTH 16" ptsz.shp

OGR: OGROpen(ptsz.shp/003B5B80) succeeded as ESRI Shapefile.
INFO: Open of `ptsz.shp'
      using driver `ESRI Shapefile' successful.
SHAPE: Creating index file ptsz.qix
OGR: GetLayerCount() = 1

Using DEPTH equal to 16, produced index file is of size 213 MB:

D:\dev\gdal\bugs\1594>ls -lh
total 267M
-rw-rw-rw-  1 mloskot 0  15M 2007-10-24 18:27 ptsz.dbf
-rw-rw-rw-  1 mloskot 0 213M 2007-10-26 07:00 ptsz.qix
-rw-rw-rw-  1 mloskot 0  33M 2007-10-24 18:34 ptsz.shp
-rw-rw-rw-  1 mloskot 0 7.2M 2007-10-24 18:34 ptsz.shx

I hope it makes sense and is helpful to understand the problem and how to solve it.

The bug will be closed as fixed after patch is applied following the second solution.

comment:10 Changed 13 years ago by warmerdam

Mateusz,

If maxdepth is not passed in I think we should use a value of 12 instead of the current apparently unbounded depth.

comment:11 Changed 13 years ago by Mateusz Łoskot

Resolution: fixed
Status: assignedclosed

I applied fix following the second proposed solution using value suggested by Frank.

#define MAX_DEFAULT_TREE_DEPTH 12 

Now, if user does not specify depth of the spatial index tree, the algorithm makes simple estimation based number of features in a shapefile. If this calculated number of tree levels is higher than MAX_DEFAULT_TREE_DEPTH, the algorithm falls back with using MAX_DEFAULT_TREE_DEPTH value (and short message is printed if CPL_DEBUG=ON is set).

Fixed in trunk (r12543) and branches/1.4 (r12544)

Changed 13 years ago by Mateusz Łoskot

Chart presenting how DEPTH value influences size of .qix file. This chart was generated using Kredik's test dataset (ptsz.shp, 33MB)

Note: See TracTickets for help on using tickets.