Opened 8 years ago

Last modified 3 years ago

#2428 new enhancement

r.external to accept gdal config options

Reported by: perrygeo Owned by: grass-dev@…
Priority: normal Milestone: 7.6.2
Component: Default Version: svn-trunk
Keywords: r.external Cc:
CPU: x86-64 Platform: Unspecified

Description

When linking to an external GDAL raster source, it would be useful to pass GDAL configuration options (http://trac.osgeo.org/gdal/wiki/ConfigOptions)

Consider this scenario: I need to specify a GDAL config option to read an NetCDF file correctly. I can specify GDAL_NETCDF_BOTTOMUP=NO as an environment variable which works for most cases but when using a multiprocessing approach to parallelization (as e.g. t.aggregate does) the newly spawned processes don't inherit the same environment and will fail.

The solution might be for r.external to accept GDAL config options that can be applied regardless of the environment variables and can allow externally linked rasters to function properly across processes.

Change History (16)

comment:1 by neteler, 8 years ago

Keywords: r.external added

The request sounds reasonable but I didn't figure out how to pass "papszOptions" to GDAL as used in r.external. The magic line in r.out.gdal is this:

# r.in.gdal/main.c

	hDstDS =
	    GDALCreate(hDriver, output->answer, cellhead.cols, cellhead.rows,
		       ref.nfiles, datatype, papszOptions);

but in r.external GDALOpen() is used. Perhaps a GDAL expert can tell us the trick.

in reply to:  1 ; comment:2 by glynn, 8 years ago

Replying to neteler:

The request sounds reasonable but I didn't figure out how to pass "papszOptions" to GDAL as used in r.external. The magic line in r.out.gdal is this:

but in r.external GDALOpen() is used. Perhaps a GDAL expert can tell us the trick.

Note that r.external is the analogue of r.in.gdal, which doesn't accept any configuration options.

The analogue of r.out.gdal is r.external.out, which has an options= option.

If options are needed for reading, a similar option should be added to both r.in.gdal and r.external, presumably using GDALOpenEx() instead of GDALOpen(). The latter will require extending the GDAL "link" format (lib/raster/gdal.c).

in reply to:  2 ; comment:3 by neteler, 8 years ago

Replying to glynn:

Replying to neteler:

The request sounds reasonable but I didn't figure out how to pass "papszOptions" to GDAL as used in r.external. The magic line in r.out.gdal is this:

but in r.external GDALOpen() is used. Perhaps a GDAL expert can tell us the trick.

Note that r.external is the analogue of r.in.gdal, which doesn't accept any configuration options.

While it doesn't directly, I had added a larger cache some time ago by setting GDALSetCacheMax() to 300MB rather than the tiny 40MB default GDAL cache size. This speeds up import tremendously:

r.in.gdal ...
memory=integer
    Maximum memory to be used (in MB)
    Cache size for raster rows
    Options: 0-2047
    Default: 300

I wonder how to get that into r.external (I suppose that it would benefit as well).

If options are needed for reading, a similar option should be added to both r.in.gdal and r.external, presumably using GDALOpenEx() instead of GDALOpen(). The latter will require extending the GDAL "link" format (lib/raster/gdal.c).

OK (no idea how to implement that).

in reply to:  3 ; comment:4 by dylan, 7 years ago

Replying to neteler:

Replying to glynn:

Replying to neteler:

The request sounds reasonable but I didn't figure out how to pass "papszOptions" to GDAL as used in r.external. The magic line in r.out.gdal is this:

but in r.external GDALOpen() is used. Perhaps a GDAL expert can tell us the trick.

Note that r.external is the analogue of r.in.gdal, which doesn't accept any configuration options.

While it doesn't directly, I had added a larger cache some time ago by setting GDALSetCacheMax() to 300MB rather than the tiny 40MB default GDAL cache size. This speeds up import tremendously:

r.in.gdal ...
memory=integer
    Maximum memory to be used (in MB)
    Cache size for raster rows
    Options: 0-2047
    Default: 300

I wonder how to get that into r.external (I suppose that it would benefit as well).

If options are needed for reading, a similar option should be added to both r.in.gdal and r.external, presumably using GDALOpenEx() instead of GDALOpen(). The latter will require extending the GDAL "link" format (lib/raster/gdal.c).

OK (no idea how to implement that).

Finding this thread after searching for some ways to speed-up file access to maps linked via r.external.

The adjustable cache solution in r.in.gdal appears to be:

if (parm.memory->answer && *parm.memory->answer) {
	   /* TODO: GDALGetCacheMax() overflows at 2GiB, implement use of GDALSetCacheMax64() */
           GDALSetCacheMax(atol(parm.memory->answer) * 1024 * 1024);
           G_verbose_message(_("Using memory cache size: %.1f MiB"), GDALGetCacheMax()/1024.0/1024.0);
    }

Could this same block of code be used within r.external? I don't fully understand how r.external works, so I suppose that it is more complicated than this.

Or another option, is there an environmental variable that could be used to control the GDAL cache size?

in reply to:  4 ; comment:5 by glynn, 7 years ago

Replying to dylan:

Could this same block of code be used within r.external? I don't fully understand how r.external works, so I suppose that it is more complicated than this.

r.external itself just sets up the "link" between GRASS and the data file. The actual I/O occurs in lib/raster (gdal.c, open.c, get_row.c, close.c) when a GRASS module reads the map.

But the data which r.external controls is per-map, while this appears to be a global setting. What happens when a module reads multiple GDAL-linked maps with different settings? It might make more sense to set this in Rast_init_gdal() from an environment variable or $GISRC variable.

in reply to:  5 comment:6 by dylan, 7 years ago

Replying to glynn:

Replying to dylan:

Could this same block of code be used within r.external? I don't fully understand how r.external works, so I suppose that it is more complicated than this.

r.external itself just sets up the "link" between GRASS and the data file. The actual I/O occurs in lib/raster (gdal.c, open.c, get_row.c, close.c) when a GRASS module reads the map.

But the data which r.external controls is per-map, while this appears to be a global setting. What happens when a module reads multiple GDAL-linked maps with different settings? It might make more sense to set this in Rast_init_gdal() from an environment variable or $GISRC variable.

Thank you for the clarification Glynn. I think that a suitable environmental or GRASS variable would be ideal. Something that isn't widely used but *very* important when working with massive, numerous, or massive and numerous files. I am unable to implement but happy to test and document.

comment:7 by dylan, 6 years ago

Checking-in, any progress?

comment:8 by neteler, 6 years ago

Milestone: 7.1.07.2.0

Milestone renamed

comment:9 by neteler, 6 years ago

Milestone: 7.2.07.2.1

Ticket retargeted after milestone closed

comment:10 by martinl, 5 years ago

Milestone: 7.2.17.2.2

comment:11 by martinl, 5 years ago

Milestone: 7.2.27.4.0

All enhancement tickets should be assigned to 7.4 milestone.

comment:12 by neteler, 5 years ago

Milestone: 7.4.07.4.1

Ticket retargeted after milestone closed

comment:13 by neteler, 4 years ago

Milestone: 7.4.17.4.2

comment:14 by martinl, 4 years ago

Milestone: 7.4.27.6.0

All enhancement tickets should be assigned to 7.6 milestone.

comment:15 by martinl, 4 years ago

Milestone: 7.6.07.6.1

Ticket retargeted after milestone closed

comment:16 by martinl, 3 years ago

Milestone: 7.6.17.6.2

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.