Opened 7 years ago

Closed 3 years ago

#2055 closed enhancement (fixed)

r.in.gdal lacks flag "-r Limit import to the current region"

Reported by: neteler Owned by: martinl
Priority: normal Milestone: 7.4.0
Component: Raster Version: svn-trunk
Keywords: r.in.gdal, r.import Cc: grass-dev@…
CPU: Unspecified Platform: Unspecified

Description

While v.in.ogr offers "-r Limit import to the current region" the module r.in.gdal does not.

The code for such a -r flag will be like

gdal_translate
       -projwin ulx uly lrx lry:
           Selects a subwindow from the source image for
           copying (like -srcwin) but with the corners given
           in georeferenced coordinates.

as found in

http://svn.osgeo.org/gdal/trunk/gdal/apps/gdal_translate.cpp

Attachments (5)

r_in_gdal_2055.diff (7.6 KB) - added by martinl 4 years ago.
implement -r flag
region.png (91.9 KB) - added by martinl 3 years ago.
example
r_in_gdal_2055_1.diff (13.1 KB) - added by martinl 3 years ago.
improved version - check region outside window
r_in_gdal_2055_2.diff (9.6 KB) - added by martinl 3 years ago.
another version: shrink region to input grid extent
r.in.gdal_region.patch (14.5 KB) - added by mmetz 3 years ago.
patch to limit import to the current region, accounting for lat/lon

Download all attachments as: .zip

Change History (38)

comment:1 in reply to:  description ; Changed 7 years ago by mmetz

Replying to neteler:

While v.in.ogr offers "-r Limit import to the current region" the module r.in.gdal does not.

v.in.ogr -r imports all features that overlap with the current region. The result might thus have extents extending beyond the current region, which makes sense.

Translated to raster import, I see a disadvantage when limiting the import to the current region: for a given project, raw data are commonly coming from different sources with different extents and resolution. You would then need to use one of the r.resamp.* modules to resampling raw data to the current region, and r.resamp.* modules typically extend the current region by the margin required for resampling (if not nn). Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.

What would be the advantage of r.in.gdal -r?

comment:2 in reply to:  1 ; Changed 7 years ago by neteler

Replying to mmetz: ...

Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.

What would be the advantage of r.in.gdal -r?

That would exactly be my use case. Example: I need a tiny area from the huge Black Marble map or the gmted2010 DEM (files in GB range). Getting it in at original resolution but cut to my area would be very useful as optional flag.

comment:3 in reply to:  2 ; Changed 7 years ago by mmetz

Replying to neteler:

Replying to mmetz: ...

Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.

What would be the advantage of r.in.gdal -r?

That would exactly be my use case. Example: I need a tiny area from the huge Black Marble map or the gmted2010 DEM (files in GB range). Getting it in at original resolution but cut to my area would be very useful as optional flag.

Why would it be useful, even though it would introduce resampling artefacts? Why not use r.external?

comment:4 in reply to:  1 Changed 7 years ago by mmetz

Replying to mmetz:

Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.

To be precise: inside the current region, effectively shrinking the current region.

comment:5 in reply to:  3 ; Changed 7 years ago by neteler

Replying to mmetz:

Replying to neteler:

Replying to mmetz: ...

Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.

I don't understand why resampling is involved. Keep pixels as-is within the "MASK", i.e. current region, if -r is used.

What would be the advantage of r.in.gdal -r?

That would exactly be my use case. Example: I need a tiny area from the huge Black Marble map or the gmted2010 DEM (files in GB range). Getting it in at original resolution but cut to my area would be very useful as optional flag.

Why would it be useful, even though it would introduce resampling artefacts?

Artifacts I would only expect at the border cells in the worst case and they could be imported even completely (these pixels) leading to a tiny extension to the current region.

Why not use r.external?

wxNIZ and other modules failed. Secondly there is always the risk that the original is deleted. Finally, why keep a big file when I only need to use a tiny fraction of it in GRASS... (think disk space on laptops).

comment:6 in reply to:  5 ; Changed 7 years ago by mmetz

Replying to neteler:

Replying to mmetz:

Replying to neteler:

Replying to mmetz: ...

Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.

I don't understand why resampling is involved. Keep pixels as-is within the "MASK", i.e. current region, if -r is used.

I meant, if resampling is one of the following processing steps, it will produce artefacts, because r.resamp.* modules typically extend the current region by the margin required for resampling (if not nn).

Why not use r.external?

wxNIZ and other modules failed.

Sounds like a need for separate tickets for wxNVIZ and those other modules.

Secondly there is always the risk that the original is deleted. Finally, why keep a big file when I only need to use a tiny fraction of it in GRASS... (think disk space on laptops).

So the potential advantage would be disk space? Why not use r.in.gdal + r.resamp.* + g.remove?

comment:7 in reply to:  6 Changed 7 years ago by neteler

Replying to mmetz: ...

So the potential advantage would be disk space? Why not use r.in.gdal + r.resamp.* + g.remove?

Disk space and user time. Why three steps if it could be in a single one without overhead?

comment:8 Changed 6 years ago by veroandreo

Adding a use case here to support the addition of -r flag to r.in.gdal

I'm working with time series (2003-2013) of MODIS-Aqua L3 Chlorophyll concentration and Rrs data. Data is available globally (http://oceandata.sci.gsfc.nasa.gov/MODISA/Mapped/8Day/4km/chlor/), but I just need a small sample covering the argentinean sea. I have more than 40 images per year, 10 years of data, 5 different products, ~150 Mb each. That's a lot of space for my notebook. I cannot loop over all data set cos i fill the disk completely... So, i have to go folder by folder uncompressing (original data are .bz2 hdf files), importing into grass, subseting, and then removing the global files...

I could really use a flag to subset to the region extent on-the-fly when importing into grass and save disk space and time :)

comment:9 in reply to:  8 ; Changed 6 years ago by mmetz

Replying to veroandreo:

Adding a use case here to support the addition of -r flag to r.in.gdal

I'm working with time series (2003-2013) of MODIS-Aqua L3 Chlorophyll concentration and Rrs data. Data is available globally (http://oceandata.sci.gsfc.nasa.gov/MODISA/Mapped/8Day/4km/chlor/), but I just need a small sample covering the argentinean sea. I have more than 40 images per year, 10 years of data, 5 different products, ~150 Mb each. That's a lot of space for my notebook. I cannot loop over all data set cos i fill the disk completely... So, i have to go folder by folder uncompressing (original data are .bz2 hdf files), importing into grass, subseting, and then removing the global files...

I could really use a flag to subset to the region extent on-the-fly when importing into grass and save disk space and time :)

Why not use r.external? This requires very little disk space. If you want to clip the import to the current region and delete the external raster data afterwards, you could use

r.external in=<name_of_input> out=<name_of_input>.tmp
r.mapcalc "<name_of_input> = <name_of_input>.tmp"
g.remove rast=<name_of_input>.tmp

granted that the current region is aligned to the input raster.

Be aware of several bands in the input raster.

comment:10 Changed 6 years ago by veroandreo

Thanks, Markus! I wasn't aware of r.external. That solved the disk space issue, but it is not as straightforward as a -r flag in r.in.gdal would be :) Best, Vero

comment:11 Changed 4 years ago by martinl

Milestone: 7.0.07.0.5

comment:12 in reply to:  9 Changed 4 years ago by neteler

Keywords: r.import added
Milestone: 7.0.57.2.0

Replying to mmetz:

Replying to veroandreo:

Adding a use case here to support the addition of -r flag to r.in.gdal I'm working with time series (2003-2013) of MODIS-Aqua L3 Chlorophyll concentration and Rrs data. Data is available globally (http://oceandata.sci.gsfc.nasa.gov/MODISA/Mapped/8Day/4km/chlor/), but I just need a small sample covering the argentinean sea. I have more than 40 images per year, 10 years of data, 5 different products, ~150 Mb each. That's a lot of space for my notebook.

... now with Sentinel-2, we are in the multi-GB range per image scene (which itself can consist of up to 12 tiles, 13 channels)...

I cannot loop over all data set cos i fill the disk completely... So, i have to go folder by folder uncompressing (original data are .bz2 hdf files), importing into grass, subseting, and then removing the global files...

I could really use a flag to subset to the region extent on-the-fly when importing into grass and save disk space and time :)

Why not use r.external? This requires very little disk space. If you want to clip the import to the current region and delete the external raster data afterwards, you could use

r.external in=<name_of_input> out=<name_of_input>.tmp
r.mapcalc "<name_of_input> = <name_of_input>.tmp"
g.remove rast=<name_of_input>.tmp

granted that the current region is aligned to the input raster.

Be aware of several bands in the input raster.

Proposal: adding the approach suggested above to r.import (simplifying the problem to some Python coding).

comment:13 in reply to:  1 ; Changed 4 years ago by neteler

Replying to mmetz:

Replying to neteler:

While v.in.ogr offers "-r Limit import to the current region" the module r.in.gdal does not.

v.in.ogr -r imports all features that overlap with the current region. The result might thus have extents extending beyond the current region, which makes sense.

Translated to raster import, I see a disadvantage when limiting the import to the current region: for a given project, raw data are commonly coming from different sources with different extents and resolution. You would then need to use one of the r.resamp.* modules to resampling raw data to the current region, and r.resamp.* modules typically extend the current region by the margin required for resampling (if not nn). Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.

The solution would be "lazy cutting", that is not cutting sharp to the boundary but allow for a small "extra". Essentially, like v.in.ogr behaves. Resampling is definitely undesired since we usually import as-is in r.in.gdal.

Lines to look at: https://trac.osgeo.org/grass/browser/grass/trunk/raster/r.in.gdal/main.c#L359

comment:14 in reply to:  13 ; Changed 4 years ago by martinl

Replying to neteler:

The solution would be "lazy cutting", that is not cutting sharp to the boundary but allow for a small "extra". Essentially, like v.in.ogr behaves. Resampling is definitely undesired since we usually import as-is in r.in.gdal.

Lines to look at: https://trac.osgeo.org/grass/browser/grass/trunk/raster/r.in.gdal/main.c#L359

Please try out attachment:r_in_gdal_2055.diff.

Changed 4 years ago by martinl

Attachment: r_in_gdal_2055.diff added

implement -r flag

comment:15 in reply to:  14 Changed 4 years ago by neteler

Replying to martinl:

Replying to neteler:

The solution would be "lazy cutting", that is not cutting sharp to the boundary but allow for a small "extra". Essentially, like v.in.ogr behaves. Resampling is definitely undesired since we usually import as-is in r.in.gdal.

Lines to look at: https://trac.osgeo.org/grass/browser/grass/trunk/raster/r.in.gdal/main.c#L359

Please try out attachment:r_in_gdal_2055.diff

Thank you for that!

# current region setting at 200m res
GRASS 7.2.svn (eu_laea):~ > g.region -g
projection=99
zone=0
n=3554400
s=2680400
w=4025000
e=4678400
nsres=200
ewres=200
rows=4370
cols=3267
cells=14276790

# input map is larger:
gdalinfo EUD_CP-DEMS_4500025000-AA.tif
...
Corner Coordinates:
Upper Left  ( 4000000.000, 3000000.000) (  5d31' 4.07"E, 50d 1'26.83"N)
Lower Left  ( 4000000.000, 2000000.000) (  6d11'52.97"E, 41d 1'36.49"N)
Upper Right ( 5000000.000, 3000000.000) ( 19d26'39.40"E, 49d43' 8.38"N)
Lower Right ( 5000000.000, 2000000.000) ( 18d 1'22.97"E, 40d46'27.19"N)
Center      ( 4500000.000, 2500000.000) ( 12d17'27.23"E, 45d35'15.84"N)


# import of 25m EU LAEA DEM with -r
GRASS 7.3.svn (eu_laea): > r.in.gdal -r EUD_CP-DEMS_4500025000-AA.tif out=test
...
(1000,-21180) of size 26137x1 on raster of 40000x40000.
ERROR 5: EUD_CP-DEMS_4500025000-AA.tif, band 1: Access window out of range in RasterIO().  Requested
(1000,-21179) of size 26137x1 on raster of 40000x40000.
ERROR 5: EUD_CP-DEMS_4500025000-AA.tif, band 1: Access window out of range in RasterIO().  Requested
(1000,-21178) of size 26137x1 on raster of 40000x40000.
ERROR 5: EUD_CP-DEMS_4500025000-AA.tif, band 1: Access window out of range in RasterIO().  Requested
(1000,-21177) of size 26137x1 on raster of 40000x40000.
More than 1000 errors or warnings have been reported. No more will be reported from now.
  39%
...
 100%

# resulting map is at 25m as expected
GRASS 7.2.svn (eu_laea):~ > r.info -g test
north=3554400
south=2680375
east=4678425
west=4025000
nsres=25
ewres=25
rows=34961
cols=26137
cells=913775657
datatype=FCELL
ncats=0

# verification
GRASS 7.2.svn (eu_laea): > r.external EUD_CP-DEMS_4500025000-AA.tif output=original
r.info -g original
north=3000000
south=2000000
east=5000000
west=4000000
nsres=25
ewres=25
rows=40000
cols=40000
cells=1600000000
datatype=FCELL
ncats=0

###########
# stats
r.univar original
total null and non-null cells: 913775657
total null cells: 579614112
...

r.univar test
total null and non-null cells: 913775657
total null cells: 0

Looks like a very good start!

Two issues:

  • bug: it does not write NULL but 0 if the current region is not full covered. Perhaps some NULL initialization is yet missing?
  • better suppress the GDAL messages

comment:16 in reply to:  14 ; Changed 4 years ago by mmetz

Replying to martinl:

Replying to neteler:

The solution would be "lazy cutting", that is not cutting sharp to the boundary but allow for a small "extra". Essentially, like v.in.ogr behaves. Resampling is definitely undesired since we usually import as-is in r.in.gdal.

Lines to look at: https://trac.osgeo.org/grass/browser/grass/trunk/raster/r.in.gdal/main.c#L359

Please try out attachment:r_in_gdal_2055.diff.

Tests are missing:

  • if the current region and the input raster do not overlap, fatal error, but special care for Lat/Lon locations
  • if the current region and the input raster overlap only partially, adjust the current region to avoid importing all-NULL rows/columns

For Lat/Lon, some kind of window mapping similar to Rast__create_window_mapping() in trunk/lib/raster/window_map.c would help. Otherwise, a current region with EW extents of e.g. -20,-10 and an input raster with EW extents of 0,360 will result in all NULL cells even though the current region is covered by the input raster.

comment:17 Changed 4 years ago by martinl

You are right, the patch lacks tests - it was just proof-of-concept. I will implement them in the next patch.

Last edited 4 years ago by martinl (previous) (diff)

comment:18 in reply to:  17 ; Changed 4 years ago by mmetz

Replying to martinl:

You are right, the patch lacks tests - it was just proof-of-concept. I will implement them in the next patch.

The concept is proofed when the patch is working.

With

            /* get pixel coordinates defined by current region */
            pixel_start = (int) (cellhd.west - adfGeoTransform[0]) / adfGeoTransform[1];
            line_start = (int) (cellhd.north - adfGeoTransform[3]) / adfGeoTransform[5];
            pixel_end = (int) (cellhd.east - adfGeoTransform[0]) / adfGeoTransform[1];
            line_end = (int) (cellhd.south - adfGeoTransform[3]) / adfGeoTransform[5];

you cast the numerator of the division to int while Rast_align_window() uses floor() and ceil() for the equivalent calculations. Why?

comment:19 Changed 4 years ago by neteler

Milestone: 7.2.07.2.1

I'm still quite interested in this feature :-)

comment:20 Changed 3 years ago by martinl

Cc: grass-dev@… added
Owner: changed from grass-dev@… to martinl
Status: newassigned

comment:21 in reply to:  18 Changed 3 years ago by martinl

Replying to mmetz:

you cast the numerator of the division to int while Rast_align_window() uses floor() and ceil() for the equivalent calculations. Why?

I have overlooked this issue, thanks for comment. It was changed in attachment:r_in_gdal_2055_1.diff

comment:22 in reply to:  16 ; Changed 3 years ago by martinl

Replying to mmetz:

  • if the current region and the input raster overlap only partially, adjust the current region to avoid importing all-NULL rows/columns

The new version of patch attachment:r_in_gdal_2055_1.diff still fills the whole grid of computation region. Cells outside of input raster grid are filled by NULL values. The solution you are suggesting would make probably better sense (avoid importing all-NULL rows/columns). I will take a look at this issue.

Changed 3 years ago by martinl

Attachment: region.png added

example

comment:23 in reply to:  22 Changed 3 years ago by martinl

Replying to martinl:

The new version of patch attachment:r_in_gdal_2055_1.diff still fills the whole grid of computation region. Cells outside of input raster grid are filled by NULL values. The solution you are suggesting would make probably better sense (avoid importing all-NULL rows/columns). I will take a look at this issue.

See attachment:r_in_gdal_2055_2.diff which avoids importing empty rows/cols outside of input grid. The region is shrinked to input grid extent. See example shown in attachment:region.png. First version of patch (attachment:r_in_gdal_2055_2.diff) creates raster map with same extent as red box (computational region) since the second version (attachment:r_in_gdal_2055_2.diff) creates smaller raster map overlapping red and blue box (input grid extent). Testing and comments very welcomed.

comment:24 in reply to:  16 Changed 3 years ago by martinl

Replying to mmetz:

  • if the current region and the input raster do not overlap, fatal error, but special care for Lat/Lon locations

Both versions of patch (attachment:r_in_gdal_2055_1.diff and attachment:r_in_gdal_2055_2.diff) now also check whether input raster and region overlaps. What is missing: special care for Lat/Lon locations

Last edited 3 years ago by martinl (previous) (diff)

Changed 3 years ago by martinl

Attachment: r_in_gdal_2055_1.diff added

improved version - check region outside window

Changed 3 years ago by martinl

Attachment: r_in_gdal_2055_2.diff added

another version: shrink region to input grid extent

comment:25 Changed 3 years ago by martinl

Both patches - attachment:r_in_gdal_2055_1.diff and attachment:r_in_gdal_2055_2.diff updated to reflect recent changes in r.in.gdal.

Changed 3 years ago by mmetz

Attachment: r.in.gdal_region.patch added

patch to limit import to the current region, accounting for lat/lon

comment:26 in reply to:  25 ; Changed 3 years ago by mmetz

The patch attachment:r.in.gdal_region.patch limits import to the current region and does lat/lon wrapping. The patch is automatically doing region cropping (like r.proj), i.e. only the part overlapping with the current region is imported. The patch uses window mapping equivalent to the raster lib.

Test commands:

# create a region that crosses 180E
g.region -p n=80 s=-80 w=90 e=270 res=1

# create a test map
r.mapcalc "test = 1"

# export
r.out.gdal in=test out=test.tif

# set region to a part of the test map
g.region -p n=80 s=-80 w=0 e=180 res=1

# import test map
r.in.gdal in=test.tif out=test1 -r

# check imported map, both visually and with r.info

# set region to -180, 180, should import the whole test map with a gap in the middle
g.region -p n=80 s=-80 w=-180 e=180 res=1

# import test map
r.in.gdal in=test.tif out=test2 -r

# check imported map, both visually and with r.info

comment:27 in reply to:  26 ; Changed 3 years ago by neteler

Replying to mmetz:

The patch attachment:r.in.gdal_region.patch limits import to the current region and does lat/lon wrapping. The patch is automatically doing region cropping (like r.proj), i.e. only the part overlapping with the current region is imported. The patch uses window mapping equivalent to the raster lib.

Thanks! Do you plan to submit this to trunk? This would facilitate testing.

comment:28 in reply to:  27 ; Changed 3 years ago by martinl

Replying to neteler:

Replying to mmetz:

The patch attachment:r.in.gdal_region.patch limits import to the current region and does lat/lon wrapping. The patch is automatically doing region cropping (like r.proj), i.e. only the part overlapping with the current region is imported. The patch uses window mapping equivalent to the raster lib.

Thanks! Do you plan to submit this to trunk? This would facilitate testing.

yes, +1 for submitting to trunk

comment:29 in reply to:  28 ; Changed 3 years ago by mmetz

Replying to martinl:

Replying to neteler:

Replying to mmetz:

The patch attachment:r.in.gdal_region.patch limits import to the current region and does lat/lon wrapping. The patch is automatically doing region cropping (like r.proj), i.e. only the part overlapping with the current region is imported. The patch uses window mapping equivalent to the raster lib.

Thanks! Do you plan to submit this to trunk? This would facilitate testing.

yes, +1 for submitting to trunk

The patch needs some refinement with regard to nodata handling (I'm busy with it), then it can be submitted to trunk.

comment:30 in reply to:  29 Changed 3 years ago by mmetz

Replying to mmetz:

The patch needs some refinement with regard to nodata handling (I'm busy with it), then it can be submitted to trunk.

Done with trunk r70938. Please test.

comment:31 Changed 3 years ago by martinl

Seems to work as expected. Thanks for working on this. I suggest to change milestone to 7.4.0 and close this issue.

comment:32 Changed 3 years ago by martinl

Milestone: 7.2.17.2.2

comment:33 in reply to:  31 Changed 3 years ago by martinl

Milestone: 7.2.27.4.0
Resolution: fixed
Status: assignedclosed

Replying to martinl:

Seems to work as expected. Thanks for working on this. I suggest to change milestone to 7.4.0 and close this issue.

No objections, done.

Note: See TracTickets for help on using tickets.