Opened 7 years ago

Closed 7 years ago

#1694 closed defect (invalid)

r.in.lidar tries to allocate way too much memory

Reported by: torsti Owned by: grass-dev@…
Priority: normal Milestone: 7.0.0
Component: Raster Version: svn-trunk
Keywords: r.in.lidar Cc:
CPU: x86-64 Platform: Linux

Description

Trying to import a LAS dataset containing ~11 million points with r.in.lidar I get the following error message: "ERROR: G_calloc: unable to allocate 18446744073471563701 * 4 bytes of memory at main.c:528"

I know the dataset is large, but allocating ~64 exabytes of memory seems a bit excessive.

Importing the same dataset using v.in.lidar works and with 6.4.2 and las2txt neither v.in.ascii nor r.in.ascii have any problems with the same dataset.

GRASS 7.0 version was revision 52573.

Change History (9)

comment:1 Changed 7 years ago by martinl

Component: DefaultRaster
Keywords: r.in.lidar added

comment:2 Changed 7 years ago by dnewcomb

I know the dataset is large, but allocating ~64 exabytes of memory seems a bit >excessive.

That must be a recent development . I was able to use r.in.lidar on 7 x 3.3 billion point las files simultaneously on an 8 core computer to point count and calculate range range using the 2012_04_21 svn snapshot without excessive memory use. How big was the region and how many cells were you trying to process into?

comment:3 Changed 7 years ago by hamish

please check your region resolution, what does g.region say about the number of rows and columns? memory use is directly tied to the region resolution and the statistical aggregation method used. (which method?) See the r.in.xyz man page for discussion about it.

on a positive note, I'm happy to see that the G_alloc() calculation for how much memory it needs seems to handle & printf into the exabyte range without overflowing.. for some future time when the datasets are actually that big :)

Hamish

comment:4 Changed 7 years ago by hamish

See the r.in.xyz man page for discussion about it.

( the choice raster resolution in r.in.xyz and r.in.lidar has a profound effect on the result, and must be chosen wisely. I typically do several iterations at different raster resolutions and do some stats on the (masked) results to find the optimal one. I have purposely avoided having the modules make any attempt to choose that for you since it is such a dataset and purpose driven choice, and needs the human operator to consider factors beyond the numbers themselves. )

Hamish

comment:5 Changed 7 years ago by torsti

The region the las file covers is 3000 by 3000 (meters) and the resolution was 1x1.

The command:

r.in.lidar -o --overwrite input=R4133C4.laz output=R4133C4.las method=mean

After setting the resolution to 10 by 10 (g.region res=10) it still wants almost the same amount of memory:

ERROR: G_calloc: unable to allocate 18446744072977286712 * 4 bytes of memory at main.c:528

With cellsize 20x20:

ERROR: G_calloc: unable to allocate 1964475953 * 4 bytes of memory at main.c:528

This is in the range of mortal computers, I just happen to be testing on a machine too weak for this kind of processing ;-)

With bigger cell sizes it runs, but the result is not really useful.

For small areas it runs fine on higher resolutions, e.g. a 100 by 100 area with cellsize 1 by 1.

My issue is not that r.in.lidar can't be used on large datasets on underpowered computers, I'm just wondering whether the 64 exabytes can be the right amount of memory needed for cell sizes of 1x1 to 10x10 for a total area of 3000mx3000m with an average point density a bit over 1 point per square meter (11000000 points/ 9000000 m2).

comment:6 in reply to:  5 Changed 7 years ago by mmetz

Replying to torsti:

The region the las file covers is 3000 by 3000 (meters) and the resolution was 1x1.

Can you provide the current region settings for the 1x1 resolution, i.e. the output of g.region -p?

What matters is not the resolution alone but the number of rows and columns in the current region, which are determined by the region extents and the resolution. That is, you probably need to check and adjust the region extents.

The command:

r.in.lidar -o --overwrite input=R4133C4.laz output=R4133C4.las method=mean

You might try the percent option. By default the whole map is kept in memory (percent=100)

My issue is not that r.in.lidar can't be used on large datasets on underpowered computers, I'm just wondering whether the 64 exabytes can be the right amount of memory needed for cell sizes of 1x1 to 10x10 for a total area of 3000mx3000m with an average point density a bit over 1 point per square meter (11000000 points/ 9000000 m2).

With the right region settings and making use of the percent option it should be possible to import this dataset in no time. Instead of changing only the resolution, you can try r.in.lidar on a subregion and here figure out the resolution that provides the desired results. Then set the region to cover the full input dataset (adjust extents, align to desired resolution) and import the full dataset.

HTH,

Markus M

comment:7 Changed 7 years ago by hamish

as MarkusM asked, what does g.region -p say?

can you turn debug level to 2? (g.gisenv set="DEBUG=2", then back to 0 to turn it off)

for a 3000x3000 computational region and method=mean it should use

3000*(3000+1)*4 * 2 / 1024000 = 70.3 MB

of RAM to hold the data, and complete in just a few seconds.

does las2txt | r.in.xyz input=- work? (see LIDAR page in the grass wiki for correct usage)

Hamish

comment:8 Changed 7 years ago by torsti

So memory allocation is based on the extent of the region and not the bounding box of the LAS data, that explains a lot. That was my mistake there! Still, to be on the safe side I've included the more detailed information that was asked for.

g.region -p

D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
projection: 1 (UTM)
zone:       35
datum:      etrs89
ellipsoid:  grs80
north:      7776450.217
south:      6605838.902
west:       61686.152
east:       732907.723
nsres:      1.00000027
ewres:      0.99999936
rows:       1170611
cols:       671222
cells:      785739856642

lasinfo R4133C4.laz

---------------------------------------------------------
  Header Summary
---------------------------------------------------------

  Version:                     1.2
  Source ID:                   0
  Reserved:                    0
  Project ID/GUID:             '00000000-0000-0000-0000-000000000000'
  System ID:                   ''
  Generating Software:         'EspaEngine'
  File Creation Day/Year:      0/0
  Header Byte Size             227
  Data Offset:                 329
  Header Padding:              2
  Number Var. Length Records:  1
  Point Data Format:           1
  Number of Point Records:     11064863
  Compressed:                  True
  Compression Info:            LASzip Version 2.1r0 c2 50000: POINT10 2 GPSTIME11 2
  Number of Points by Return:  0 0 0 0 0 
  Scale Factor X Y Z:          0.01 0.01 0.01
  Offset X Y Z:                -0.00 -0.00 -0.00
  Min X Y Z:                   389000.00 7149000.00 91.36
  Max X Y Z:                   391999.99 7151999.99 139.61
  Spatial Reference:           
None

None

...

I updated r.in.lidar to revision 52593.

both r.in.lidar and r.in.xyz complain about the amount of memory, because the region is too big, but the amount of memory they ask for is not in the exabyte range.

> r.in.lidar -o --overwrite input=R4133C4.laz output=R4133C4.las method=mean

D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
Over-riding projection check
D2/2: region.n=7776450.217000  region.s=6605838.902000  region.ns_res=1.000000
D2/2: region.rows=1170611  [box_rows=1170611]  region.cols=671222
Current region rows: 1170611, cols: 671222
ERROR: G_calloc: unable to allocate 785741027253 * 4 bytes of memory at
       main.c:534
> las2txt --keep-classes 2 --parse xyz --delimiter="|" --input R4133C4.las --output=/tmp/las.tmp
> r.in.xyz input=/tmp/las.tmp output=R4133C4_ground_points

D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
D2/2: region.n=7776450.217000  region.s=6605838.902000  region.ns_res=1.000000
D2/2: region.rows=1170611  [box_rows=1170611]  region.cols=671222
Current region rows: 1170611, cols: 671222
ERROR: G_calloc: unable to allocate 785741027253 * 4 bytes of memory at
       main.c:491

After adjusting the extent to the BBOX of the LAS data:

g.region -a n=7152000 s=7149000 e=392000 w=389000 res=1

r.in.lidar

D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
Over-riding projection check
D2/2: region.n=7152000.000000  region.s=7149000.000000  region.ns_res=1.000000
D2/2: region.rows=3000  [box_rows=3000]  region.cols=3000
Reading data ...
D2/2: pass=1/1  pass_n=7152000.000000  pass_s=7149000.000000  rows=3000
D2/2: allocating n_array
D2/2: allocating sum_array
 100%
D2/2: pass 1 finished, 11064827 coordinates in box
Writing to map ...
 100%
D1/2: close R4133C4.las compressed
D1/2: G_find_raster2(): name=R4133C4.las mapset=PERMANENT
D1/2: G_find_raster2(): name=R4133C4.las mapset=PERMANENT
D1/2: G_find_raster2(): name=R4133C4.las mapset=PERMANENT
r.in.lidar complete. 11064827 points found in region.
D1/2: Processed 11064863 points

r.in.xyz

D2/2: G__read_Cell_head
D2/2: G__read_Cell_head_array
D2/2: region.n=7152000.000000  region.s=7149000.000000  region.ns_res=1.000000
D2/2: region.rows=3000  [box_rows=3000]  region.cols=3000
D2/2: estimated number of lines in file: 3224929
Reading data ...
D2/2: pass=1/1  pass_n=7152000.000000  pass_s=7149000.000000  rows=3000
D2/2: allocating n_array
D2/2: allocating sum_array
 100%
D2/2: pass 1 finished, 3135351 coordinates in box
Writing to map ...
 100%
D1/2: close R4133C4_ground_points compressed
D1/2: G_find_raster2(): name=R4133C4_ground_points mapset=PERMANENT
D1/2: G_find_raster2(): name=R4133C4_ground_points mapset=PERMANENT
D1/2: G_find_raster2(): name=R4133C4_ground_points mapset=PERMANENT
r.in.xyz complete. 3135351 points found in region.
D1/2: Processed 3135367 lines.

So everything seems to work now.

Sorry for the most likely unfounded & inaccurate bug report.

Cheers, Torsti

comment:9 in reply to:  8 Changed 7 years ago by mmetz

Resolution: invalid
Status: newclosed

Replying to torsti:

So memory allocation is based on the extent of the region and not the bounding box of the LAS data, that explains a lot.

I have added a paragraph to the manuals of r.in.lidar and r.in.xyz that emphasizes that.

Sorry for the most likely unfounded & inaccurate bug report.

Let's say it was unclear documentation.

Closing ticket.

Markus M

Note: See TracTickets for help on using tickets.