Opened 11 years ago
Closed 7 years ago
#2055 closed enhancement (fixed)
r.in.gdal lacks flag "-r Limit import to the current region"
Reported by: | neteler | Owned by: | martinl |
---|---|---|---|
Priority: | normal | Milestone: | 7.4.0 |
Component: | Raster | Version: | svn-trunk |
Keywords: | r.in.gdal, r.import | Cc: | grass-dev@… |
CPU: | Unspecified | Platform: | Unspecified |
Description
While v.in.ogr offers "-r Limit import to the current region" the module r.in.gdal does not.
The code for such a -r flag will be like
gdal_translate -projwin ulx uly lrx lry: Selects a subwindow from the source image for copying (like -srcwin) but with the corners given in georeferenced coordinates.
as found in
http://svn.osgeo.org/gdal/trunk/gdal/apps/gdal_translate.cpp
Attachments (5)
Change History (38)
follow-ups: 2 4 13 comment:1 by , 11 years ago
follow-up: 3 comment:2 by , 11 years ago
Replying to mmetz: ...
Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.
What would be the advantage of r.in.gdal -r?
That would exactly be my use case. Example: I need a tiny area from the huge Black Marble map or the gmted2010 DEM (files in GB range). Getting it in at original resolution but cut to my area would be very useful as optional flag.
follow-up: 5 comment:3 by , 11 years ago
Replying to neteler:
Replying to mmetz: ...
Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.
What would be the advantage of r.in.gdal -r?
That would exactly be my use case. Example: I need a tiny area from the huge Black Marble map or the gmted2010 DEM (files in GB range). Getting it in at original resolution but cut to my area would be very useful as optional flag.
Why would it be useful, even though it would introduce resampling artefacts? Why not use r.external?
comment:4 by , 11 years ago
Replying to mmetz:
Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.
To be precise: inside the current region, effectively shrinking the current region.
follow-up: 6 comment:5 by , 11 years ago
Replying to mmetz:
Replying to neteler:
Replying to mmetz: ...
Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.
I don't understand why resampling is involved. Keep pixels as-is within the "MASK", i.e. current region, if -r is used.
What would be the advantage of r.in.gdal -r?
That would exactly be my use case. Example: I need a tiny area from the huge Black Marble map or the gmted2010 DEM (files in GB range). Getting it in at original resolution but cut to my area would be very useful as optional flag.
Why would it be useful, even though it would introduce resampling artefacts?
Artifacts I would only expect at the border cells in the worst case and they could be imported even completely (these pixels) leading to a tiny extension to the current region.
Why not use r.external?
wxNIZ and other modules failed. Secondly there is always the risk that the original is deleted. Finally, why keep a big file when I only need to use a tiny fraction of it in GRASS... (think disk space on laptops).
follow-up: 7 comment:6 by , 11 years ago
Replying to neteler:
Replying to mmetz:
Replying to neteler:
Replying to mmetz: ...
Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.
I don't understand why resampling is involved. Keep pixels as-is within the "MASK", i.e. current region, if -r is used.
I meant, if resampling is one of the following processing steps, it will produce artefacts, because r.resamp.* modules typically extend the current region by the margin required for resampling (if not nn).
Why not use r.external?
wxNIZ and other modules failed.
Sounds like a need for separate tickets for wxNVIZ and those other modules.
Secondly there is always the risk that the original is deleted. Finally, why keep a big file when I only need to use a tiny fraction of it in GRASS... (think disk space on laptops).
So the potential advantage would be disk space? Why not use r.in.gdal + r.resamp.* + g.remove?
comment:7 by , 11 years ago
Replying to mmetz: ...
So the potential advantage would be disk space? Why not use r.in.gdal + r.resamp.* + g.remove?
Disk space and user time. Why three steps if it could be in a single one without overhead?
follow-up: 9 comment:8 by , 11 years ago
Adding a use case here to support the addition of -r flag to r.in.gdal
I'm working with time series (2003-2013) of MODIS-Aqua L3 Chlorophyll concentration and Rrs data. Data is available globally (http://oceandata.sci.gsfc.nasa.gov/MODISA/Mapped/8Day/4km/chlor/), but I just need a small sample covering the argentinean sea. I have more than 40 images per year, 10 years of data, 5 different products, ~150 Mb each. That's a lot of space for my notebook. I cannot loop over all data set cos i fill the disk completely... So, i have to go folder by folder uncompressing (original data are .bz2 hdf files), importing into grass, subseting, and then removing the global files...
I could really use a flag to subset to the region extent on-the-fly when importing into grass and save disk space and time :)
follow-up: 12 comment:9 by , 11 years ago
Replying to veroandreo:
Adding a use case here to support the addition of -r flag to r.in.gdal
I'm working with time series (2003-2013) of MODIS-Aqua L3 Chlorophyll concentration and Rrs data. Data is available globally (http://oceandata.sci.gsfc.nasa.gov/MODISA/Mapped/8Day/4km/chlor/), but I just need a small sample covering the argentinean sea. I have more than 40 images per year, 10 years of data, 5 different products, ~150 Mb each. That's a lot of space for my notebook. I cannot loop over all data set cos i fill the disk completely... So, i have to go folder by folder uncompressing (original data are .bz2 hdf files), importing into grass, subseting, and then removing the global files...
I could really use a flag to subset to the region extent on-the-fly when importing into grass and save disk space and time :)
Why not use r.external? This requires very little disk space. If you want to clip the import to the current region and delete the external raster data afterwards, you could use
r.external in=<name_of_input> out=<name_of_input>.tmp r.mapcalc "<name_of_input> = <name_of_input>.tmp" g.remove rast=<name_of_input>.tmp
granted that the current region is aligned to the input raster.
Be aware of several bands in the input raster.
comment:10 by , 11 years ago
Thanks, Markus! I wasn't aware of r.external. That solved the disk space issue, but it is not as straightforward as a -r flag in r.in.gdal would be :) Best, Vero
comment:11 by , 9 years ago
Milestone: | 7.0.0 → 7.0.5 |
---|
comment:12 by , 9 years ago
Keywords: | r.import added |
---|---|
Milestone: | 7.0.5 → 7.2.0 |
Replying to mmetz:
Replying to veroandreo:
Adding a use case here to support the addition of -r flag to r.in.gdal I'm working with time series (2003-2013) of MODIS-Aqua L3 Chlorophyll concentration and Rrs data. Data is available globally (http://oceandata.sci.gsfc.nasa.gov/MODISA/Mapped/8Day/4km/chlor/), but I just need a small sample covering the argentinean sea. I have more than 40 images per year, 10 years of data, 5 different products, ~150 Mb each. That's a lot of space for my notebook.
... now with Sentinel-2, we are in the multi-GB range per image scene (which itself can consist of up to 12 tiles, 13 channels)...
I cannot loop over all data set cos i fill the disk completely... So, i have to go folder by folder uncompressing (original data are .bz2 hdf files), importing into grass, subseting, and then removing the global files...
I could really use a flag to subset to the region extent on-the-fly when importing into grass and save disk space and time :)
Why not use r.external? This requires very little disk space. If you want to clip the import to the current region and delete the external raster data afterwards, you could use
r.external in=<name_of_input> out=<name_of_input>.tmp r.mapcalc "<name_of_input> = <name_of_input>.tmp" g.remove rast=<name_of_input>.tmpgranted that the current region is aligned to the input raster.
Be aware of several bands in the input raster.
Proposal: adding the approach suggested above to r.import (simplifying the problem to some Python coding).
follow-up: 14 comment:13 by , 8 years ago
Replying to mmetz:
Replying to neteler:
While v.in.ogr offers "-r Limit import to the current region" the module r.in.gdal does not.
v.in.ogr -r imports all features that overlap with the current region. The result might thus have extents extending beyond the current region, which makes sense.
Translated to raster import, I see a disadvantage when limiting the import to the current region: for a given project, raw data are commonly coming from different sources with different extents and resolution. You would then need to use one of the r.resamp.* modules to resampling raw data to the current region, and r.resamp.* modules typically extend the current region by the margin required for resampling (if not nn). Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.
The solution would be "lazy cutting", that is not cutting sharp to the boundary but allow for a small "extra". Essentially, like v.in.ogr behaves. Resampling is definitely undesired since we usually import as-is in r.in.gdal.
Lines to look at: https://trac.osgeo.org/grass/browser/grass/trunk/raster/r.in.gdal/main.c#L359
follow-ups: 15 16 comment:14 by , 8 years ago
Replying to neteler:
The solution would be "lazy cutting", that is not cutting sharp to the boundary but allow for a small "extra". Essentially, like v.in.ogr behaves. Resampling is definitely undesired since we usually import as-is in r.in.gdal.
Lines to look at: https://trac.osgeo.org/grass/browser/grass/trunk/raster/r.in.gdal/main.c#L359
Please try out attachment:r_in_gdal_2055.diff.
comment:15 by , 8 years ago
Replying to martinl:
Replying to neteler:
The solution would be "lazy cutting", that is not cutting sharp to the boundary but allow for a small "extra". Essentially, like v.in.ogr behaves. Resampling is definitely undesired since we usually import as-is in r.in.gdal.
Lines to look at: https://trac.osgeo.org/grass/browser/grass/trunk/raster/r.in.gdal/main.c#L359
Please try out attachment:r_in_gdal_2055.diff
Thank you for that!
# current region setting at 200m res GRASS 7.2.svn (eu_laea):~ > g.region -g projection=99 zone=0 n=3554400 s=2680400 w=4025000 e=4678400 nsres=200 ewres=200 rows=4370 cols=3267 cells=14276790 # input map is larger: gdalinfo EUD_CP-DEMS_4500025000-AA.tif ... Corner Coordinates: Upper Left ( 4000000.000, 3000000.000) ( 5d31' 4.07"E, 50d 1'26.83"N) Lower Left ( 4000000.000, 2000000.000) ( 6d11'52.97"E, 41d 1'36.49"N) Upper Right ( 5000000.000, 3000000.000) ( 19d26'39.40"E, 49d43' 8.38"N) Lower Right ( 5000000.000, 2000000.000) ( 18d 1'22.97"E, 40d46'27.19"N) Center ( 4500000.000, 2500000.000) ( 12d17'27.23"E, 45d35'15.84"N) # import of 25m EU LAEA DEM with -r GRASS 7.3.svn (eu_laea): > r.in.gdal -r EUD_CP-DEMS_4500025000-AA.tif out=test ... (1000,-21180) of size 26137x1 on raster of 40000x40000. ERROR 5: EUD_CP-DEMS_4500025000-AA.tif, band 1: Access window out of range in RasterIO(). Requested (1000,-21179) of size 26137x1 on raster of 40000x40000. ERROR 5: EUD_CP-DEMS_4500025000-AA.tif, band 1: Access window out of range in RasterIO(). Requested (1000,-21178) of size 26137x1 on raster of 40000x40000. ERROR 5: EUD_CP-DEMS_4500025000-AA.tif, band 1: Access window out of range in RasterIO(). Requested (1000,-21177) of size 26137x1 on raster of 40000x40000. More than 1000 errors or warnings have been reported. No more will be reported from now. 39% ... 100% # resulting map is at 25m as expected GRASS 7.2.svn (eu_laea):~ > r.info -g test north=3554400 south=2680375 east=4678425 west=4025000 nsres=25 ewres=25 rows=34961 cols=26137 cells=913775657 datatype=FCELL ncats=0 # verification GRASS 7.2.svn (eu_laea): > r.external EUD_CP-DEMS_4500025000-AA.tif output=original r.info -g original north=3000000 south=2000000 east=5000000 west=4000000 nsres=25 ewres=25 rows=40000 cols=40000 cells=1600000000 datatype=FCELL ncats=0 ########### # stats r.univar original total null and non-null cells: 913775657 total null cells: 579614112 ... r.univar test total null and non-null cells: 913775657 total null cells: 0
Looks like a very good start!
Two issues:
- bug: it does not write NULL but 0 if the current region is not full covered. Perhaps some NULL initialization is yet missing?
- better suppress the GDAL messages
follow-ups: 22 24 comment:16 by , 8 years ago
Replying to martinl:
Replying to neteler:
The solution would be "lazy cutting", that is not cutting sharp to the boundary but allow for a small "extra". Essentially, like v.in.ogr behaves. Resampling is definitely undesired since we usually import as-is in r.in.gdal.
Lines to look at: https://trac.osgeo.org/grass/browser/grass/trunk/raster/r.in.gdal/main.c#L359
Please try out attachment:r_in_gdal_2055.diff.
Tests are missing:
- if the current region and the input raster do not overlap, fatal error, but special care for Lat/Lon locations
- if the current region and the input raster overlap only partially, adjust the current region to avoid importing all-NULL rows/columns
For Lat/Lon, some kind of window mapping similar to Rast__create_window_mapping()
in trunk/lib/raster/window_map.c would help. Otherwise, a current region with EW extents of e.g. -20,-10 and an input raster with EW extents of 0,360 will result in all NULL cells even though the current region is covered by the input raster.
follow-up: 18 comment:17 by , 8 years ago
You are right, the patch lacks tests - it was just proof-of-concept. I will finish the patch.
follow-up: 21 comment:18 by , 8 years ago
Replying to martinl:
You are right, the patch lacks tests - it was just proof-of-concept. I will implement them in the next patch.
The concept is proofed when the patch is working.
With
/* get pixel coordinates defined by current region */ pixel_start = (int) (cellhd.west - adfGeoTransform[0]) / adfGeoTransform[1]; line_start = (int) (cellhd.north - adfGeoTransform[3]) / adfGeoTransform[5]; pixel_end = (int) (cellhd.east - adfGeoTransform[0]) / adfGeoTransform[1]; line_end = (int) (cellhd.south - adfGeoTransform[3]) / adfGeoTransform[5];
you cast the numerator of the division to int while Rast_align_window()
uses floor()
and ceil()
for the equivalent calculations. Why?
comment:20 by , 8 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:21 by , 8 years ago
Replying to mmetz:
you cast the numerator of the division to int while
Rast_align_window()
usesfloor()
andceil()
for the equivalent calculations. Why?
I have overlooked this issue, thanks for comment. It was changed in attachment:r_in_gdal_2055_1.diff
follow-up: 23 comment:22 by , 8 years ago
Replying to mmetz:
- if the current region and the input raster overlap only partially, adjust the current region to avoid importing all-NULL rows/columns
The new version of patch attachment:r_in_gdal_2055_1.diff still fills the whole grid of computation region. Cells outside of input raster grid are filled by NULL values. The solution you are suggesting would make probably better sense (avoid importing all-NULL rows/columns). I will take a look at this issue.
comment:23 by , 8 years ago
Replying to martinl:
The new version of patch attachment:r_in_gdal_2055_1.diff still fills the whole grid of computation region. Cells outside of input raster grid are filled by NULL values. The solution you are suggesting would make probably better sense (avoid importing all-NULL rows/columns). I will take a look at this issue.
See attachment:r_in_gdal_2055_2.diff which avoids importing empty rows/cols outside of input grid. The region is shrinked to input grid extent. See example shown in attachment:region.png. First version of patch (attachment:r_in_gdal_2055_2.diff) creates raster map with same extent as red box (computational region) since the second version (attachment:r_in_gdal_2055_2.diff) creates smaller raster map overlapping red and blue box (input grid extent). Testing and comments very welcomed.
comment:24 by , 8 years ago
Replying to mmetz:
- if the current region and the input raster do not overlap, fatal error, but special care for Lat/Lon locations
Both versions of patch (attachment:r_in_gdal_2055_1.diff and attachment:r_in_gdal_2055_2.diff) now also check whether input raster and region overlaps. What is missing: special care for Lat/Lon locations
by , 8 years ago
Attachment: | r_in_gdal_2055_1.diff added |
---|
improved version - check region outside window
by , 8 years ago
Attachment: | r_in_gdal_2055_2.diff added |
---|
another version: shrink region to input grid extent
follow-up: 26 comment:25 by , 8 years ago
Both patches - attachment:r_in_gdal_2055_1.diff and attachment:r_in_gdal_2055_2.diff updated to reflect recent changes in r.in.gdal
.
by , 8 years ago
Attachment: | r.in.gdal_region.patch added |
---|
patch to limit import to the current region, accounting for lat/lon
follow-up: 27 comment:26 by , 8 years ago
The patch attachment:r.in.gdal_region.patch limits import to the current region and does lat/lon wrapping. The patch is automatically doing region cropping (like r.proj), i.e. only the part overlapping with the current region is imported. The patch uses window mapping equivalent to the raster lib.
Test commands:
# create a region that crosses 180E g.region -p n=80 s=-80 w=90 e=270 res=1 # create a test map r.mapcalc "test = 1" # export r.out.gdal in=test out=test.tif # set region to a part of the test map g.region -p n=80 s=-80 w=0 e=180 res=1 # import test map r.in.gdal in=test.tif out=test1 -r # check imported map, both visually and with r.info # set region to -180, 180, should import the whole test map with a gap in the middle g.region -p n=80 s=-80 w=-180 e=180 res=1 # import test map r.in.gdal in=test.tif out=test2 -r # check imported map, both visually and with r.info
follow-up: 28 comment:27 by , 8 years ago
Replying to mmetz:
The patch attachment:r.in.gdal_region.patch limits import to the current region and does lat/lon wrapping. The patch is automatically doing region cropping (like r.proj), i.e. only the part overlapping with the current region is imported. The patch uses window mapping equivalent to the raster lib.
Thanks! Do you plan to submit this to trunk? This would facilitate testing.
follow-up: 29 comment:28 by , 8 years ago
Replying to neteler:
Replying to mmetz:
The patch attachment:r.in.gdal_region.patch limits import to the current region and does lat/lon wrapping. The patch is automatically doing region cropping (like r.proj), i.e. only the part overlapping with the current region is imported. The patch uses window mapping equivalent to the raster lib.
Thanks! Do you plan to submit this to trunk? This would facilitate testing.
yes, +1 for submitting to trunk
follow-up: 30 comment:29 by , 8 years ago
Replying to martinl:
Replying to neteler:
Replying to mmetz:
The patch attachment:r.in.gdal_region.patch limits import to the current region and does lat/lon wrapping. The patch is automatically doing region cropping (like r.proj), i.e. only the part overlapping with the current region is imported. The patch uses window mapping equivalent to the raster lib.
Thanks! Do you plan to submit this to trunk? This would facilitate testing.
yes, +1 for submitting to trunk
The patch needs some refinement with regard to nodata handling (I'm busy with it), then it can be submitted to trunk.
comment:30 by , 8 years ago
follow-up: 33 comment:31 by , 8 years ago
Seems to work as expected. Thanks for working on this. I suggest to change milestone to 7.4.0 and close this issue.
comment:32 by , 8 years ago
Milestone: | 7.2.1 → 7.2.2 |
---|
comment:33 by , 7 years ago
Milestone: | 7.2.2 → 7.4.0 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
Replying to martinl:
Seems to work as expected. Thanks for working on this. I suggest to change milestone to 7.4.0 and close this issue.
No objections, done.
Replying to neteler:
v.in.ogr -r imports all features that overlap with the current region. The result might thus have extents extending beyond the current region, which makes sense.
Translated to raster import, I see a disadvantage when limiting the import to the current region: for a given project, raw data are commonly coming from different sources with different extents and resolution. You would then need to use one of the r.resamp.* modules to resampling raw data to the current region, and r.resamp.* modules typically extend the current region by the margin required for resampling (if not nn). Limiting the import to the current region followed by resampling would thus eat away rows and columns at the region's borders.
What would be the advantage of r.in.gdal -r?