Opened 12 years ago
Closed 12 years ago
#1694 closed defect (invalid)
r.in.lidar tries to allocate way too much memory
Reported by: | torsti | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.0.0 |
Component: | Raster | Version: | svn-trunk |
Keywords: | r.in.lidar | Cc: | |
CPU: | x86-64 | Platform: | Linux |
Description
Trying to import a LAS dataset containing ~11 million points with r.in.lidar I get the following error message: "ERROR: G_calloc: unable to allocate 18446744073471563701 * 4 bytes of memory at main.c:528"
I know the dataset is large, but allocating ~64 exabytes of memory seems a bit excessive.
Importing the same dataset using v.in.lidar works and with 6.4.2 and las2txt neither v.in.ascii nor r.in.ascii have any problems with the same dataset.
GRASS 7.0 version was revision 52573.
Change History (9)
comment:1 by , 12 years ago
Component: | Default → Raster |
---|---|
Keywords: | r.in.lidar added |
comment:2 by , 12 years ago
comment:3 by , 12 years ago
please check your region resolution, what does g.region say about the number of rows and columns? memory use is directly tied to the region resolution and the statistical aggregation method used. (which method?) See the r.in.xyz man page for discussion about it.
on a positive note, I'm happy to see that the G_alloc() calculation for how much memory it needs seems to handle & printf into the exabyte range without overflowing.. for some future time when the datasets are actually that big :)
Hamish
comment:4 by , 12 years ago
See the r.in.xyz man page for discussion about it.
( the choice raster resolution in r.in.xyz and r.in.lidar has a profound effect on the result, and must be chosen wisely. I typically do several iterations at different raster resolutions and do some stats on the (masked) results to find the optimal one. I have purposely avoided having the modules make any attempt to choose that for you since it is such a dataset and purpose driven choice, and needs the human operator to consider factors beyond the numbers themselves. )
Hamish
follow-up: 6 comment:5 by , 12 years ago
The region the las file covers is 3000 by 3000 (meters) and the resolution was 1x1.
The command:
r.in.lidar -o --overwrite input=R4133C4.laz output=R4133C4.las method=mean
After setting the resolution to 10 by 10 (g.region res=10) it still wants almost the same amount of memory:
ERROR: G_calloc: unable to allocate 18446744072977286712 * 4 bytes of memory at main.c:528
With cellsize 20x20:
ERROR: G_calloc: unable to allocate 1964475953 * 4 bytes of memory at main.c:528
This is in the range of mortal computers, I just happen to be testing on a machine too weak for this kind of processing ;-)
With bigger cell sizes it runs, but the result is not really useful.
For small areas it runs fine on higher resolutions, e.g. a 100 by 100 area with cellsize 1 by 1.
My issue is not that r.in.lidar can't be used on large datasets on underpowered computers, I'm just wondering whether the 64 exabytes can be the right amount of memory needed for cell sizes of 1x1 to 10x10 for a total area of 3000mx3000m with an average point density a bit over 1 point per square meter (11000000 points/ 9000000 m2).
comment:6 by , 12 years ago
Replying to torsti:
The region the las file covers is 3000 by 3000 (meters) and the resolution was 1x1.
Can you provide the current region settings for the 1x1 resolution, i.e. the output of g.region -p?
What matters is not the resolution alone but the number of rows and columns in the current region, which are determined by the region extents and the resolution. That is, you probably need to check and adjust the region extents.
The command:
r.in.lidar -o --overwrite input=R4133C4.laz output=R4133C4.las method=mean
You might try the percent option. By default the whole map is kept in memory (percent=100)
My issue is not that r.in.lidar can't be used on large datasets on underpowered computers, I'm just wondering whether the 64 exabytes can be the right amount of memory needed for cell sizes of 1x1 to 10x10 for a total area of 3000mx3000m with an average point density a bit over 1 point per square meter (11000000 points/ 9000000 m2).
With the right region settings and making use of the percent option it should be possible to import this dataset in no time. Instead of changing only the resolution, you can try r.in.lidar on a subregion and here figure out the resolution that provides the desired results. Then set the region to cover the full input dataset (adjust extents, align to desired resolution) and import the full dataset.
HTH,
Markus M
comment:7 by , 12 years ago
as MarkusM asked, what does g.region -p
say?
can you turn debug level to 2? (g.gisenv set="DEBUG=2"
, then back to 0 to turn it off)
for a 3000x3000 computational region and method=mean it should use
3000*(3000+1)*4 * 2 / 1024000 = 70.3 MB
of RAM to hold the data, and complete in just a few seconds.
does las2txt | r.in.xyz input=-
work? (see LIDAR page in the grass wiki for correct usage)
Hamish
follow-up: 9 comment:8 by , 12 years ago
So memory allocation is based on the extent of the region and not the bounding box of the LAS data, that explains a lot. That was my mistake there! Still, to be on the safe side I've included the more detailed information that was asked for.
g.region -p
D2/2: G__read_Cell_head D2/2: G__read_Cell_head_array D2/2: G__read_Cell_head D2/2: G__read_Cell_head_array projection: 1 (UTM) zone: 35 datum: etrs89 ellipsoid: grs80 north: 7776450.217 south: 6605838.902 west: 61686.152 east: 732907.723 nsres: 1.00000027 ewres: 0.99999936 rows: 1170611 cols: 671222 cells: 785739856642
lasinfo R4133C4.laz
--------------------------------------------------------- Header Summary --------------------------------------------------------- Version: 1.2 Source ID: 0 Reserved: 0 Project ID/GUID: '00000000-0000-0000-0000-000000000000' System ID: '' Generating Software: 'EspaEngine' File Creation Day/Year: 0/0 Header Byte Size 227 Data Offset: 329 Header Padding: 2 Number Var. Length Records: 1 Point Data Format: 1 Number of Point Records: 11064863 Compressed: True Compression Info: LASzip Version 2.1r0 c2 50000: POINT10 2 GPSTIME11 2 Number of Points by Return: 0 0 0 0 0 Scale Factor X Y Z: 0.01 0.01 0.01 Offset X Y Z: -0.00 -0.00 -0.00 Min X Y Z: 389000.00 7149000.00 91.36 Max X Y Z: 391999.99 7151999.99 139.61 Spatial Reference: None None ...
I updated r.in.lidar to revision 52593.
both r.in.lidar and r.in.xyz complain about the amount of memory, because the region is too big, but the amount of memory they ask for is not in the exabyte range.
> r.in.lidar -o --overwrite input=R4133C4.laz output=R4133C4.las method=mean D2/2: G__read_Cell_head D2/2: G__read_Cell_head_array Over-riding projection check D2/2: region.n=7776450.217000 region.s=6605838.902000 region.ns_res=1.000000 D2/2: region.rows=1170611 [box_rows=1170611] region.cols=671222 Current region rows: 1170611, cols: 671222 ERROR: G_calloc: unable to allocate 785741027253 * 4 bytes of memory at main.c:534
> las2txt --keep-classes 2 --parse xyz --delimiter="|" --input R4133C4.las --output=/tmp/las.tmp > r.in.xyz input=/tmp/las.tmp output=R4133C4_ground_points D2/2: G__read_Cell_head D2/2: G__read_Cell_head_array D2/2: region.n=7776450.217000 region.s=6605838.902000 region.ns_res=1.000000 D2/2: region.rows=1170611 [box_rows=1170611] region.cols=671222 Current region rows: 1170611, cols: 671222 ERROR: G_calloc: unable to allocate 785741027253 * 4 bytes of memory at main.c:491
After adjusting the extent to the BBOX of the LAS data:
g.region -a n=7152000 s=7149000 e=392000 w=389000 res=1
r.in.lidar
D2/2: G__read_Cell_head D2/2: G__read_Cell_head_array Over-riding projection check D2/2: region.n=7152000.000000 region.s=7149000.000000 region.ns_res=1.000000 D2/2: region.rows=3000 [box_rows=3000] region.cols=3000 Reading data ... D2/2: pass=1/1 pass_n=7152000.000000 pass_s=7149000.000000 rows=3000 D2/2: allocating n_array D2/2: allocating sum_array 100% D2/2: pass 1 finished, 11064827 coordinates in box Writing to map ... 100% D1/2: close R4133C4.las compressed D1/2: G_find_raster2(): name=R4133C4.las mapset=PERMANENT D1/2: G_find_raster2(): name=R4133C4.las mapset=PERMANENT D1/2: G_find_raster2(): name=R4133C4.las mapset=PERMANENT r.in.lidar complete. 11064827 points found in region. D1/2: Processed 11064863 points
r.in.xyz
D2/2: G__read_Cell_head D2/2: G__read_Cell_head_array D2/2: region.n=7152000.000000 region.s=7149000.000000 region.ns_res=1.000000 D2/2: region.rows=3000 [box_rows=3000] region.cols=3000 D2/2: estimated number of lines in file: 3224929 Reading data ... D2/2: pass=1/1 pass_n=7152000.000000 pass_s=7149000.000000 rows=3000 D2/2: allocating n_array D2/2: allocating sum_array 100% D2/2: pass 1 finished, 3135351 coordinates in box Writing to map ... 100% D1/2: close R4133C4_ground_points compressed D1/2: G_find_raster2(): name=R4133C4_ground_points mapset=PERMANENT D1/2: G_find_raster2(): name=R4133C4_ground_points mapset=PERMANENT D1/2: G_find_raster2(): name=R4133C4_ground_points mapset=PERMANENT r.in.xyz complete. 3135351 points found in region. D1/2: Processed 3135367 lines.
So everything seems to work now.
Sorry for the most likely unfounded & inaccurate bug report.
Cheers, Torsti
comment:9 by , 12 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Replying to torsti:
So memory allocation is based on the extent of the region and not the bounding box of the LAS data, that explains a lot.
I have added a paragraph to the manuals of r.in.lidar and r.in.xyz that emphasizes that.
Sorry for the most likely unfounded & inaccurate bug report.
Let's say it was unclear documentation.
Closing ticket.
Markus M
That must be a recent development . I was able to use r.in.lidar on 7 x 3.3 billion point las files simultaneously on an 8 core computer to point count and calculate range range using the 2012_04_21 svn snapshot without excessive memory use. How big was the region and how many cells were you trying to process into?