Opened 15 years ago

Last modified 7 years ago

#494 reopened defect

break in importing and cleaning very large vector datasets

Reported by: gisboa Owned by: grass-dev@…
Priority: major Milestone: 6.4.6
Component: Vector Version: 6.4.2
Keywords: v.in.ogr, vector import clean build Cc:
CPU: x86-64 Platform: MSWindows 7

Description (last modified by neteler)

I try to use Grass with large vector files, with sizes up to 0,5GB, this usually results in errors during import, or if not during import, then at least while cleaning them. Both on Mac (Leopard intel) and Linux (openSuse 11 (32 and 64bit). It doesn't seem to be a problem with the data, as I can process all data when splitting the import, and handle the various slices separately.

Files are too large to attach...

Wouter

Mac OSX
======
GRASS 6.4.0RC3 (nl-rdn):/data/grassdb/nl-rdn > v.clean --o input=top_top10vec_gebouwen output=top_top10vec_geb2 tool=bpol
--------------------------------------------------
Tool: Threshold
Break polygons: 0.000000e+00
--------------------------------------------------
Copying vector lines...
Rebuilding parts of topology...
Building topology for vector map <top_top10vec_geb2>...
Registering primitives...
6041168 primitives registered
25000849 vertices registered
Number of nodes: 6037532
Number of primitives: 6041168
Number of points: 0
Number of lines: 0
Number of boundaries: 3021055
Number of centroids: 3020113
Number of areas: -
Number of isles: -
--------------------------------------------------
Tool: Break polygons
v.clean(22861) malloc: *** mmap(size=275042304) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
ERROR: G_realloc: unable to allocate 275040036 bytes at
       break_polygons.c:188
Linux 64bit
=======
Rebuilding parts of topology...
Building topology for vector map <top_top10vec_geb2>...
Registering primitives...
6041168 primitives registered
25000849 vertices registered
Number of nodes: 6037532
Number of primitives: 6041168
Number of points: 0
Number of lines: 0
Number of boundaries: 3021055
Number of centroids: 3020113
Number of areas: -
Number of isles: -
--------------------------------------------------
Tool: Break polygons
ERROR: G_realloc: unable to allocate 34400040 bytes at break_polygons.c:188

Change History (10)

comment:1 by neteler, 15 years ago

Component: defaultVector

in reply to:  description comment:2 by mmetz, 15 years ago

Replying to gisboa:

I try to use Grass with large vector files, with sizes up to 0,5GB, this usually results in errors during import, or if not during import, then at least while cleaning them.

...

Tool: Break polygons ERROR: G_realloc: unable to allocate 34400040 bytes at break_polygons.c:188

Cleaning vector files of this size (about 0.5GB) in v.in.ogr or v.clean can require a lot of memory, up to 8GB. Is it possible that you ran out of memory? Physical memory plus swap space smaller than 8GB? If yes, increase physical memory and/or swap space, and try develbranch_6 instead of 6.4.0RC3.

comment:3 by marisn, 14 years ago

Resolution: worksforme
Status: newclosed

As there has been no feedback in 16 months, closing this ticket, as insufficent RAM is most likely issue.

comment:4 by dido, 10 years ago

Resolution: worksforme
Status: closedreopened

Same issue was observed on a large (282 MB shapefile) containing ~350 000 polygons. A dump from the output window:

Layer: BGM_Polygons_L0

Counting polygons for 358823 features...

Importing map 358823 features...


... 483699 primitives registered

17237143 vertices registered

Number of nodes: 414562

Number of primitives: 483699

Number of points: 0

Number of lines: 0

Number of boundaries: 483699

Number of centroids: 0

Number of areas: -

Number of isles: -


Cleaning polygons, result is not guaranteed!


Snap boundaries (threshold = 1.000e-005):

G_realloc: unable to allocate 155760024 bytes at snap.c:155 Finished with error

System is Win7x64 with 16G of RAM. RAM usage reported was ~4.3G at max, dropped down to ~2.6G, then the error popped-up. Progress was at ~70%.

This was seen first on a QGIS 2.2.0 Valmiera, same behavior was seen on 1.8.0 Lisboa.

comment:5 by dido, 10 years ago

CPU: Allx86-64
Platform: AllMSWindows 7
Version: 6.4.0 RCs6.4.2

comment:6 by neteler, 10 years ago

Description: modified (diff)

comment:7 by neteler, 10 years ago

Keywords: v.in.ogr added
Milestone: 6.4.06.4.5

If you still use 6.4.2, please consider to upgrade.

Indeed, for large vector files please consider to use GRASS 7 release branch which is much faster and less memory consuming, see

http://trac.osgeo.org/grass/wiki/Grass7/NewFeatures#Libvector

comment:8 by martinl, 9 years ago

Milestone: 6.4.5

Ticket retargeted after milestone closed

comment:9 by martinl, 9 years ago

Milestone: 6.4.6

comment:10 by neteler, 7 years ago

BTW: Another (mainly solved) "import speed" ticket is #2185.

I suggest to close this ticket since GRASS GIS 7.2.x is out which addresses the import speed issues along with further topological improvements. All these changes are too invasive to be backported to GRASS GIS 6.

Note: See TracTickets for help on using tickets.