Opened 16 years ago

Closed 8 years ago

#2411 closed defect (fixed)

gdal_grid inefficiencies

Reported by: pagameba Owned by: dron
Priority: normal Milestone:
Component: Utilities Version: svn-trunk
Severity: normal Keywords: gdal_grid efficiency
Cc:

Description

This ticket is to trac my experience with gdal_grid and offer some feedback on my particular use case in which I was not able to get it to produce a DEM in an reasonable amount of time.

I am working with about 1.8 million points of XYZ data that I originally got in about 1800 separate files, each covering a small rectangular region. The data was captured at about 70m intervals or better in some cases, so I was hoping to produce a DEM at about 70m per pixel or better resolution. For my target area, this resulted in an desired output size of about 6000 x 6000 pixels.

My initial attempt was to combine all the XYZ files into a single CSV file and set up a VRT file, then pass this to gdal_grid. When nothing happened for several hours, I started to wonder about what I had attempted. I started trying alternate algorithms each with similar results. Then I took the drastic step of looking at the code and realized that it does an exhaustive loop over all the input points for every cell in the output raster - I roughly calculated this would take several years to run to completion on my machine and started looking for an alternative approach.

The best approach that I attempted was to work from the original data and try to use a 3 x 3 window approach feeding the XYZ values for the target file plus up to 8 surrounding files into gdal_grid to produce a 40x40 output file that would then be combined using gdal_merge into a full DEM. This ran quite a bit faster but still had not completed in 24 hours.

As a secondary issue, I took some of the produced tif files and tried to merge them. gdal_merge complained that the vertical size was negative, at which time I realized that the coordinate system of the output tifs was upside down (vertical only). I went over what I had done with Frank and all seemed reasonable - just vertically flipped output.

For my specific use case, then, gdal_grid could not do the job - fortunately I was able to do it using GRASS in only a few hours. But I believe this indicates that a somewhat different approach to gdal_grid would produce good results in a more reasonable amount of time.

The real issue is, I believe, that the algorithm uses every point from the input for every cell of the output. It seems unlikely that any more than a few points around the cell will have a substantial impact on the final value. Perhaps gdal_grid could be re-organized to make a single pass over the input points and put them into spatial bins that match small rectangular windows in the output grid - perhaps 10x10, then execute the existing algorithms on the 10x10 grids using the window plus up to 8 surrounding windows. If no data is found in (one or any of) the surrounding windows, the search region could be expanded. The size of the window could be configurable or some heuristic used to calculate it based on the number of input points and size of the output raster (so that some points end up in each window while keeping the number of windows minimized to reduce memory overhead).

For your consideration.

Attachments (5)

gdal_grid_performances_improvement.patch (18.1 KB ) - added by Even Rouault 16 years ago.
gdal_grid_improvements_patch2.patch (2.4 KB ) - added by thurmond 15 years ago.
Patch to be applied OVER patch(es) on this page.
test.vrt (246 bytes ) - added by thurmond 15 years ago.
Sample VRT file for testing
test.csv (56.7 KB ) - added by thurmond 15 years ago.
Sample CSV file for testing
gdal_grid_performances_improvement_v2.patch (18.1 KB ) - added by Even Rouault 15 years ago.
Patch v2 that takes into account the issue with OGR_L_GetExtent and VRT layers

Download all attachments as: .zip

Change History (19)

comment:1 by warmerdam, 16 years ago

Cc: dron added
Component: defaultUtilities
Keywords: gdal_grid added
Version: unspecified1.5.0

comment:2 by warmerdam, 16 years ago

Cc: dron removed
Owner: changed from warmerdam to dron

comment:3 by Even Rouault, 16 years ago

I'm attaching a patch that improves performance and basically implements pagameba's ideas. The patch must be applied in top of r15052 where CPLQuadTree was added.

I've tested it with a shapefile of 3601 * 3601 points to create a 3601 * 3601 pixel grid. I've build the spatial index on the shapefile before doing gdal_grid. I also ask to create the raster with tiles of dimension 256x256 as it will improve the performance when requesting the points. It run on my PC in around 15 minutes.

gdal_grid  -outsize 3601 3601 -ot Int16 -l N38W112 -zfield z N38W112.shp grid.tif -a nearest:radius1=0.0:radius2=0.0:angle=0.0:nodata=0.0 -co TILED=YES -co BLOCKXSIZE=256 -co BLOCKYSIZE=256

A new option, -margin (default = 1), is added to gdal_grid. It means that for each destination pixel, we only apply the gridding algorithm on the points contained in the box [dfXPoint - nMargin * dfDeltaX, dfXPoint + nMargin * dfDeltaX] * [ dfYPoint - nMargin * dfDeltaY, dfYPoint + nMargin * dfDeltaY].

The new algorithm probably requires less memory on huge OGR datasets as we only load the features for each raster block, but the dataset must have fast spatial queries (shapefiles with .qix, postgis) in order that algorithm to be the most efficient.

comment:4 by Even Rouault, 16 years ago

Updated version of the patch based on the changes to quad tree done in r15067

by Even Rouault, 16 years ago

in reply to:  4 comment:5 by thurmond, 15 years ago

Keywords: efficiency added
Type: enhancementdefect
Version: 1.5.0svn-trunk

Replying to rouault:

Updated version of the patch based on the changes to quad tree done in r15067

I had problems getting this patch to work with CSV files (with an associated VRT), but reverting back to the original Compute Grid Geometry block fixed the issue. Attached is a secondary patch which can be applied from within the gdal/apps directory to update the gdal_grid.cpp file to reflect the new changes. This probably could use a second look by someone more familiar with the overall code, as I have only been focused on getting it to work, and there could be extraneous lines of code contained within this block.

by thurmond, 15 years ago

Patch to be applied OVER patch(es) on this page.

comment:6 by Even Rouault, 15 years ago

Thumand,

After a bit of testing, I'm not sure why your patch would be necessary but I ran into an issue when testing with VRT/CSV. In my initial attempt, the X and Y column reported by an ogrinfo on your VRT file were of string type and not real type. The consequence of this is that spatial filtering would not work properly as by default it will rely on attribute filtering on the x and y columns of the source layer, but if the columns are of string type, the comparisons will be made on strings rather on numerical values... and "-79" > "-78" when you compare using lexicographic order.

There are many ways of fixing this :

  • add useSpatialSubquery="false" at the end of the GeometryField line in the VRT, like : <GeometryField encoding="PointFromColumns" x="x" y="y" useSpatialSubquery="false"/>. This is OK as the CSV has no fast attribute select.
  • or add a .CSVT file to indicate that the x and y columns are of type real
  • or add a SrcSQL line in the VRT file, such as <SrcSQL>SELECT CAST(x AS float), CAST(y AS float), CAST(z as float) FROM the_name_of_the_csv_layer</SrcSQL>

Doing one of those did fix the issue I had. But your patch alone without one of the above adjustments didn't change anything.

My understanding of your patch is that it is an alternate way of computing the spatial extent of the layer (and potentially most slower on formats that have fast queries for getting spatial extent -> that's the reason why I used OGR_L_GetExtent). There's certainly something I'm missing in the way you're using gdal_grid. Could you post the exact commandline you use, and ideally attach a small .csv and .vrt that demonstrate that your patch is necessary ?

comment:7 by thurmond, 15 years ago

This is just a standard x,y,z CSV file with a VRT constructed as explained on the gdal_grid help page. I would argue that it is essential that this work correctly, as it is a common entry point to gdal_grid (and it worked previously). I will attach example files.

After your patch, the result from gdal_grid is this: Grid data type is "Float64" Grid size = (256 256). Corner coordinates = (0.000000 0.000000)-(0.000000 0.000000). Grid cell size = (0.000000 0.000000).

Whereas the 'correct' result (from unaltered gdal_grid or with my patch) is this: Grid data type is "Float64" Grid size = (256 256). Corner coordinates = (467793.173828 4786280.019531)-(471301.826172 4786269.980469). Grid cell size = (13.652344 0.039062).

(In this case, the numbers are meaningless, as they are from this test file, which is a tiny subset of a much larger CSV file.)

I simply took out the OGR_L_GetExtent code and reinserted the 'old' gdal_grid code so that it worked for these files.

Maybe the best is to attempt OGR_L_GetExtent, and if it fails to get a result, to fall back on the original method?

The overall performance improvement is fantastic with the patch.

by thurmond, 15 years ago

Attachment: test.vrt added

Sample VRT file for testing

by thurmond, 15 years ago

Attachment: test.csv added

Sample CSV file for testing

comment:8 by thurmond, 15 years ago

I should add that the gdal_grid command line used for the examples above is: gdal_grid -l test -a nearest test.vrt test.tif

by Even Rouault, 15 years ago

Patch v2 that takes into account the issue with OGR_L_GetExtent and VRT layers

comment:9 by Even Rouault, 15 years ago

Thumond,

thanks for your inputs. That helped me to realize that I just had to change the bForce argument of OGR_L_GetExtent from FALSE to TRUE... I've updated my patch in v2 with that only change. So your patch is hopefully no longer necessary now.

However to get the maximum performance out of the patch, when working with a OGR file of several million points, I'd advise you to work on a spatially indexed shapefile :

ogr2ogr test.shp test.vrt ogrinfo test.shp -sql "CREATE SPATIAL INDEX ON test" gdal_grid -l test -a nearest test.shp test.tif

comment:10 by Even Rouault, 15 years ago

Arg, I lost the formatting of the commands :

ogr2ogr test.shp test.vrt
ogrinfo test.shp -sql "CREATE SPATIAL INDEX ON test"
gdal_grid -l test -a nearest test.shp test.tif 

comment:11 by Jukka Rahkonen, 9 years ago

Is the speed of gdal_grid now OK in recent GDAL versions?

comment:12 by Jukka Rahkonen, 9 years ago

Patch by Even is not applied to trunk/gdal/alg/gdal_alg.h (GDAL 2.0, 2015-01-04).

comment:13 by pagameba, 9 years ago

I do not have the original data that I was working with when I first experienced this problem so I can't really comment on whether this provides an improvement in the case I was working on, but based on the patch it seems to me that it would help significantly.

comment:14 by Jukka Rahkonen, 8 years ago

Resolution: fixed
Status: newclosed

Let's trust in @pagameba "based on the patch it seems to me that it would help significantly".

Note: See TracTickets for help on using tickets.