Opened 11 years ago
Closed 5 years ago
#2185 closed defect (fixed)
Painfully Slow 'v.in.ogr' Vector Import
Reported by: | justinzane | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.0.7 |
Component: | Vector | Version: | svn-trunk |
Keywords: | import, OGR, performance, v.in.ogr | Cc: | justinzane |
CPU: | x86-64 | Platform: | Linux |
Description
I'm trying to import shapefiles (in this case, OpenStreetMap data) into GRASS (grass70-svn) using v.in.ogr
; and it is painfully, brutally slow. By slow, I mean between 13 and 28 primitives per second.
At first I thought this was because I was using the default sqlite database on an NFS mount. But, trying sqlite on a local disk and trying MySQL to separate server both failed to improve performance. I'm getting a consistent read/write of ~125Kbps/~550Kbps no matter which shapefile or which database I use.
I cannot see any bottlenecks anywhere, as the two python /path/to/grass-gui
processes are averaging 10% of one core combined, the network utilization is below 10% consistently, and the DB server is almost quiescent. To give an example of how painful this is:
v.in.ogr dsn=/home/justin/downloads/osm_CA/places.shp WARNING: All available OGR layers will be imported into vector map <places> Check if OGR layer <places> contains polygons... Importing 7020 features (OGR layer <places>)... ----------------------------------------------------- Building topology for vector map <places@OSM>... Registering primitives... 7020 primitives registered 7020 vertices registered Building areas... 0 areas built 0 isles built Attaching islands... Attaching centroids... Number of nodes: 0 Number of primitives: 7020 Number of points: 7020 Number of lines: 0 Number of boundaries: 0 Number of centroids: 0 Number of areas: 0 Number of isles: 0 (Fri Jan 31 19:01:08 2014) Command finished (4 min 38 sec)
Note that this may belong under Database.
Attachments (3)
Change History (46)
comment:1 by , 11 years ago
Cc: | added |
---|
comment:2 by , 11 years ago
Replying to justinzane:
I'm trying to import shapefiles (in this case, OpenStreetMap data) into GRASS (grass70-svn) using
v.in.ogr
; and it is painfully, brutally slow. By slow, I mean between 13 and 28 primitives per second.
I cannot see any bottlenecks anywhere, as the two
python /path/to/grass-gui
processes are averaging 10% of one core combined, the network utilization is below 10% consistently, and the DB server is almost quiescent. To give an example of how painful this is:
v.in.ogr dsn=/home/justin/downloads/osm_CA/places.shp WARNING: All available OGR layers will be imported into vector map <places> Check if OGR layer <places> contains polygons... Importing 7020 features (OGR layer <places>)... ----------------------------------------------------- Building topology for vector map <places@OSM>... Registering primitives... 7020 primitives registered 7020 vertices registered Building areas... 0 areas built 0 isles built Attaching islands... Attaching centroids... Number of nodes: 0 Number of primitives: 7020 Number of points: 7020 Number of lines: 0 Number of boundaries: 0 Number of centroids: 0 Number of areas: 0 Number of isles: 0 (Fri Jan 31 19:01:08 2014) Command finished (4 min 38 sec)
On a standard laptop with default settings, the import of points is about 140 x faster. 7000 points from a shapefile should be imported in less than 2 seconds.
You would need to monitor the v.in.ogr
process and the corresponding db driver process (sqlite
for the default sqlite database in GRASS 7) instead of any Python processes and see if v.in.ogr
uses close to 100% CPU. If not, sqlite
should be very busy, but that should not happen because all changes to the database are committed at once after importing the primitives.
follow-up: 4 comment:3 by , 11 years ago
CPU: | Unspecified → x86-64 |
---|
Thanks for the response. I cannot post the dot graphs because of size restrictions, and resizing them makes them near unreadable.
I did another little experiement, and, it seems that the problem is not with the database driver at all. WRT MySQL and SQLite, it is the location of the 'grassdata' folder that is at issue. I had kept 'grassdata' on an NFS mount. Moving it to a local (btrfs) spinning disk meant that the import was many times faster (although moron forgot to add 'time' before the command :( ).
I'm rebuilding an unstripped glibc and other dependencies to see exactly which calls may be problematic.
As far as monitoring the mysql and v.in.ogr processes, do you have any recommended way besides running grass like:
perf record -e cycles:u -o grass.prof -g -- grass64 -text /home/justin/maps/grassdata/osm_data/OSM; #### # In grass shell [zsh]: # v.in.ogr dsn=./path/to/shape.shp output=shape # exit #### perf script -i grass.prof | gprof2dot -f perf | dot -Tpng -o output.png; gwenview output.png
comment:4 by , 11 years ago
Replying to justinzane:
Thanks for the response. I cannot post the dot graphs because of size restrictions, and resizing them makes them near unreadable.
Curiosity: Can you post the .dot
file? Is it small enough, e.g. 7ziped or after using gprof2dot
's --node-thres
or --edge-thres
?
comment:5 by , 11 years ago
I'll work on it. I'm rebuilding a few things now [and getting wedding pics ready to send to my niece, so dcraw is hogging the cpu and net] so as soon as I can do a couple of comparisons for you I will. I'll post them to my webserver if they are too large, and just add the links.
follow-up: 7 comment:6 by , 11 years ago
Please see http://www.justinzane.com/grass-perf.html for some more detailed data. I'm greatly interested in guidance on narrowing down the bottleneck with "grassdata" on a NFS mount, but I may need some assistance in the process. If anyone is on IRC, I'd be more than happy to work there. Thanks.
comment:7 by , 11 years ago
Replying to justinzane:
I'm greatly interested in guidance on narrowing down the bottleneck with "grassdata" on a NFS mount, but I may need some assistance in the process.
I don't know much about perf, but I do have some experience with accessing a GRASS database over NFS (multiple clients accessing the same GRASS database over NFS at the same time). The problem seems to be not the GRASS + NFS combination, but the connection to the NFS server (should be at least a Gigabit connection and static IP addresses for the server and all its clients) and other traffic simultaneously going to the NFS server from other clients.
You could
- check the IO traffic on the NFS server without using GRASS
- copy some stuff, e.g. an unpacked Linux kernel, from the local disk over to the NFS mount and check the performance
- investigate the actually used NFS mount options in /proc/mounts
- try to tune NFS mount options, particularly rsize,wsize, bigger seems to be better, more in
man nfs
comment:8 by , 11 years ago
[quote]check the IO traffic on the NFS server without using GRASSquote Did, using iotop while running v.in.ogr -- No noticeable IO besides my GRASS import, which is expected since it is a personal server.
[quote]copy some stuff, e.g. an unpacked Linux kernel, from the local disk over to the NFS mount and check the performancequote Works swimmingly for large files. I updated the post at [www.justinzane.com/grass-perf.html] to show a simplistic test. Basically, though my Dell laptop has a castrated NIC -- the 4 gigabit pins are not connected! -- NFS works at almost line speed. Throughput is not the problem, rather it seems to be excessive opens/syncs/similar that is the issue.
Using nfsstat -Z10
, I get about 27 writes per second. Which seems to equate to one write per primitive! I'm a GIS novice, but it seems that writing for every primitive is not efficient from a programming perspective. Am I missing something?
[quote]investigate the actually used NFS mount options in /proc/mountsquote
nfs4 rw,relatime,vers=4.0,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.182,local_lock=none,addr=192.168.0.10 0 0
[quote]try to tune NFS mount options, particularly rsize,wsize, bigger seems to be better, more in man nfs quote With writes of a few bytes to a few kB, this does not matter.
comment:9 by , 11 years ago
I did some tests with and without NFS, importing a shapefile with 50000 points and 19 attribute fields:
- shapefile on local disk, grassdata on local disk: 12000 points / sec
- shapefile on local disk, grassdata on NFS: 2200 points / sec
- shapefile on NFS, grassdata on NFS: 2100 points / sec
- shapefile on NFS, grassdata on local disk: 11000 points / sec
Here, writing to grassdata on NFS is also much slower, but always much faster than about 20 points / sec. The NFS server here is not at all a high-performance server, instead designed for power-saving.
Your minifstest gives here
NumChars: Local (s) NFS (s) NFS/Local 00000002: 000.0100, 000.0200, 002.0000 00000004: 000.0100, 000.0200, 002.0000 00000008: 000.0200, 000.0300, 001.5000 00000016: 000.0200, 000.0400, 002.0000 00000032: 000.0400, 000.0400, 001.0000 00000128: 000.1300, 000.1600, 001.2308 00000512: 000.5200, 000.5300, 001.0192
The penalty for doing lots of tiny open-write-close cycles on NFS is here maximum 2, not 7.
IMHO, something is strange with your NFS server, but I have no idea what.
comment:10 by , 11 years ago
I noticed that. There are a bunch of nfs/nfs4 perf events; but, they require root privileges when running perf. I'm trying to get advice (elsewhere) on how to properly use them without running GRASS as root. As soon as I am able to examine the NFS behaviour, I'll post the results. I'm also pulling the code so I can speak with less ignorance. :)
comment:11 by , 11 years ago
Keywords: | v.in.ogr added |
---|
@justinzane: if you get a few moments could you add a quick example on our wiki of how end users can run pref
? tx.
http://grasswiki.osgeo.org/wiki/Bugs#Using_a_profiling_tool
@mmetz: some sample OSM data to test is available here:
I'm still not understanding if this trouble is CPU bound, local IO bound, or network bound. Some combo of top, iotop, iftop, and gkrellm would hopefully shed some light on that without having to push everything through a profiling tool. What does 'top -i' show about the v.in.ogr and sqlite processes and kernel %use state? is it dead locked? spending all its time in the kernel? libogr..?
regards, Hamish
comment:12 by , 11 years ago
btw, you might also try osm2pgsql to load the OSM data into a PostGIS Postgres DB, then live-connect to the PostGIS db with grass 7 with v.external. for example:
https://trac.osgeo.org/osgeo/browser/livedvd/gisvm/trunk/bin/load_postgis.sh#L85
otherwise map coordinates are stored in $MAPSET/vector in grass's vector format, and only feature attributes are stored in the DB. Map features are connected to attributes via category number.
comment:13 by , 10 years ago
I am experiencing this issue too. It seems to be a problem with all modules which are writing simple vector geometries in loop. I am going to investigate some sqlite tweeks [1].
follow-up: 15 comment:14 by , 10 years ago
In my situation, I have whole home directory including GRASS data dir on NFS 4 mount.
- After short investigation, it seems, that setting PRAGMA synchronous=OFF on SQLite db has no effect.
- What was a great change was moving '.tmp' directory from my mapset outside of NFS (by using symlink). v.in.ogr was 8 times faster.
So my idea is, is it possible to have some environment variable, which could define global alternative directory for temporary files ? This could improve situation for all modules which are using temporary files (for example v.random is not using).
follow-ups: 16 19 comment:15 by , 10 years ago
Replying to imincik1: ...
- What was a great change was moving '.tmp' directory from my mapset outside of NFS (by using symlink). v.in.ogr was 8 times faster.
So my idea is, is it possible to have some environment variable, which could define global alternative directory for temporary files ? This could improve situation for all modules which are using temporary files (for example v.random is not using).
Can you please try this variable? http://grass.osgeo.org/grass70/manuals/variables.html
--> "TMPDIR, TEMP, TMP" (name depends on the operating system)
follow-up: 17 comment:16 by , 10 years ago
Replying to neteler:
Replying to imincik1: ...
- What was a great change was moving '.tmp' directory from my mapset outside of NFS (by using symlink). v.in.ogr was 8 times faster.
So my idea is, is it possible to have some environment variable, which could define global alternative directory for temporary files ? This could improve situation for all modules which are using temporary files (for example v.random is not using).
Can you please try this variable? http://grass.osgeo.org/grass70/manuals/variables.html
--> "TMPDIR, TEMP, TMP" (name depends on the operating system)
Thanks for this tip, I will try.
To be sure, I mean temporary directory located in mapset: $MAPSET/.tmp/$HOSTNAME/vector . There is already some variable called GRASS_VECTOR_TEMPORARY, but according description, it is just enabling and disabling temporary dir. I will play with both and report back.
comment:17 by , 10 years ago
Replying to imincik1:
To be sure, I mean temporary directory located in mapset: $MAPSET/.tmp/$HOSTNAME/vector . There is already some variable called GRASS_VECTOR_TEMPORARY, but according description, it is just enabling and disabling temporary dir. I will play with both and report back.
yes, if GRASS_VECTOR_TEMPORARY
is defined than a vector map is created in $MAPSET/.tmp/$HOSTNAME/vector
.
follow-up: 20 comment:18 by , 10 years ago
TMPDIR, TEMP, TMP has no effect on temporary directory path created by vector library.
I did one more test on importing some vector data by v.in.ogr
- temporary directory located on NFS = 16 seconds
- temporary directory located on NBD = 2 seconds
- temporary directory located in memory (tmpfs on /mnt/booster type tmpfs): 0.8 seconds !
It seems that possibility to set temporary directory path created by vector library to some quicker file system can be beneficial for all users.
comment:19 by , 10 years ago
Replying to neteler:
Replying to imincik1: ...
- What was a great change was moving '.tmp' directory from my mapset outside of NFS (by using symlink). v.in.ogr was 8 times faster.
So my idea is, is it possible to have some environment variable, which could define global alternative directory for temporary files ? This could improve situation for all modules which are using temporary files (for example v.random is not using).
Can you please try this variable? http://grass.osgeo.org/grass70/manuals/variables.html
--> "TMPDIR, TEMP, TMP" (name depends on the operating system)
Manual is not completely clear about what is driven by those but is says that /tmp
(or perhaps its platform-dependent equivalent, i.e. system tmp dir) is the default which suggests that it is a different directory than /path/to/mapset/.tmp
(i.e. GRASS-specific dir).
TMPDIR, TEMP, TMP [Various GRASS GIS commands and wxGUI] The default wxGUI temporary directory is chosen from a platform-dependent list, but the user can control the selection of this directory by setting one of the TMPDIR, TEMP or TMP environment variables Hence the wxGUI uses $TMPDIR if it is set, then $TEMP, otherwise /tmp.
I'm not sure what about the other things but for wxGUI it seems that if your grassdata
dir is at NFS it might be more advantageous if the tmp dir for GUI would the the system one because it would/could work much faster (supposing the case of generating images for map display). On the other hand if you are creating something really big, e.g. animation with g.gui.animation it might be better if you use the bigger space (which might be the .tmp dir in grassdata
at NFS).
follow-up: 21 comment:20 by , 9 years ago
comment:21 by , 9 years ago
Replying to martinl:
Replying to imincik1:
TMPDIR, TEMP, TMP has no effect on temporary directory path created by vector library.
Starting with r65348 when GRASS_TMPDIR_MAPSET is set to '0', the GRASS data TMPDIR is not
$LOCATION/$MAPSET/.tmp/$HOSTNAME
, but$TMPDIR/.tmp/$HOSTNAME
.
Sounds good. I'm curious how this will work for the mounted disc use case.
Just few clarifications. I was trying to read the source code few days ago and I think that there is no actual usage of HOSTNAME
variable but some C functions are used to retrieve this value (which probably is the value of HOSTNAME
variable). Also you say "GRASS data TMPDIR" but the actual TMPDIR
is a variable pointing to GRASS's directory in system temporary directory. Is this meant in symbolic way as $LOCATION and $MAPSET are meant or am I missing something?
Do I understand correctly that you can change it anytime (inside GRASS session), so grass.py
has to always clear both places? (Or it is even not worth attempting to care for this case in grass.py
.)
comment:22 by , 8 years ago
Milestone: | 7.0.0 → 7.0.5 |
---|
comment:23 by , 8 years ago
Milestone: | 7.0.5 → 7.0.6 |
---|
follow-up: 25 comment:24 by , 7 years ago
Not sure if this is the right ticket. Attached a GeoJSON geometry covering most of the world (digitized in a Web GIS in Web Mercator in a lousy way which is reflecting reality and reprojected to Latlong with ogr2ogr).
It takes "forever" on the "Breaking boundaries..." step while the geometry is a single polygon. I wonder why?
comment:25 by , 7 years ago
Replying to neteler:
Not sure if this is the right ticket. Attached a GeoJSON geometry covering most of the world (digitized in a Web GIS in Web Mercator in a lousy way which is reflecting reality and reprojected to Latlong with ogr2ogr).
It takes "forever" on the "Breaking boundaries..." step while the geometry is a single polygon. I wonder why?
With trunk, import happens instantly (0.08 seconds). What GRASS version are you using? Do you have changes in your local copy?
comment:27 by , 7 years ago
Replying to neteler:
Since it is a production system, I'm on 7.2.svn
With 7.2, I get the same instant (0.08 seconds) import with
v.in.ogr in=world_AOI_latlong.geojson out=world_AOI_latlong
comment:28 by , 7 years ago
Strange, here it is not fast at all:
GRASS 7.2.1svn (latlong_wgs84):~ > time -p v.in.ogr in=world_AOI_latlong.geojson out=world_AOI_latlong Check if OGR layer <OGRGeoJSON> contains polygons... 100% WARNING: Width for column properties set to 255 (was not specified by OGR), some strings may be truncated! Importing 1 features (OGR layer <OGRGeoJSON>)... 100% ----------------------------------------------------- Registering primitives... One primitive registered 7 vertices registered Number of nodes: 1 Number of primitives: 1 Number of points: 0 Number of lines: 0 Number of boundaries: 1 Number of centroids: 0 Number of areas: - Number of isles: - ----------------------------------------------------- Cleaning polygons ----------------------------------------------------- Breaking polygons... Breaking polygons (pass 1: select break points)... 100% Breaking polygons (pass 2: break at selected points)... 100% ----------------------------------------------------- Removing duplicates... 100% ----------------------------------------------------- Breaking boundaries... 98% ^C real 373.04 user 371.13 sys 0.97 GRASS 7.2.1svn (latlong_wgs84):~ > g.version -g version=7.2.1svn date=2017 revision=r70769 build_date=2017-03-20 build_platform=x86_64-pc-linux-gnu build_off_t_size=8 GRASS 7.2.1svn (latlong_wgs84):~ > cs2cs Rel. 4.9.2, 08 September 2015 usage: cs2cs [ -eEfIlrstvwW [args] ] [ +opts[=arg] ] [+to [+opts[=arg] [ files ] GRASS 7.2.1svn (latlong_wgs84):~ > gdal-config --version 2.1.2 db.connect -p driver: sqlite database: /home/mundialis/grassdata/latlong_wgs84/user1/sqlite/sqlite.db ...
"svn status" does not show a single local modification.
The disk is a SSD drive. The system is a Fedora 25 installation:
rpm -qa | sort | grep 'geos\|gdal\|proj' gdal-2.1.2-5.fc25.x86_64 gdal-devel-2.1.2-5.fc25.x86_64 gdal-grass-2.1.2-1.fc25.x86_64 gdal-libs-2.1.2-5.fc25.x86_64 gdal-python-2.1.2-5.fc25.x86_64 geos-3.5.0-3.fc25.x86_64 geos-devel-3.5.0-3.fc25.x86_64 proj-4.9.2-2.fc24.x86_64 proj-devel-4.9.2-2.fc24.x86_64 proj-epsg-4.9.2-2.fc24.x86_64 proj-nad-4.9.2-2.fc24.x86_64 pyproj-1.9.5.1-3.fc25.x86_64
strace shows a lot of lseek() if that matters:
... lseek(6, 814250, SEEK_SET) = 814250 lseek(6, 814250, SEEK_SET) = 814250 lseek(6, 814250, SEEK_SET) = 814250 lseek(6, 814250, SEEK_SET) = 814250 lseek(6, -138, SEEK_CUR) = 814112 write(6, "\f", 1) = 1 fstat(6, {st_mode=S_IFREG|0664, st_size=814250, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3242) = 3242 write(6, "\r\6\0\0\0\246n\300\212l]e\300\305\240g\372\204\344e\300d\216\25\215\1ve@\357B\276"..., 101) = 101 fstat(6, {st_mode=S_IFREG|0664, st_size=814351, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3343) = 3343 write(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\23_\363\0\254\350T@\24_\363"..., 37) = 37 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3205) = 3205 read(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\24_\363\0\254\350T@\23_\363"..., 4096) = 175 lseek(6, 814388, SEEK_SET) = 814388 lseek(6, 814388, SEEK_SET) = 814388 lseek(6, 814388, SEEK_SET) = 814388 lseek(6, 814388, SEEK_SET) = 814388 lseek(6, 814388, SEEK_SET) = 814388 lseek(6, 814388, SEEK_SET) = 814388 lseek(6, 814388, SEEK_SET) = 814388 lseek(6, 814388, SEEK_SET) = 814388 lseek(6, -138, SEEK_CUR) = 814250 write(6, "\f", 1) = 1 fstat(6, {st_mode=S_IFREG|0664, st_size=814388, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3380) = 3380 write(6, "\r\6\0\0\0\246n\300\212l]e\300\305\240g\372\204\344e\300d\216\25\215\1ve@\357B\276"..., 101) = 101 fstat(6, {st_mode=S_IFREG|0664, st_size=814489, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3481) = 3481 write(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\24_\363\0\254\350T@\23_\363"..., 37) = 37 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3343) = 3343 read(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\23_\363\0\254\350T@\24_\363"..., 4096) = 175 lseek(6, 814526, SEEK_SET) = 814526 lseek(6, 814526, SEEK_SET) = 814526 lseek(6, 814526, SEEK_SET) = 814526 lseek(6, 814526, SEEK_SET) = 814526 lseek(6, -138, SEEK_CUR) = 814388 write(6, "\f", 1) = 1 fstat(6, {st_mode=S_IFREG|0664, st_size=814526, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3518) = 3518 write(6, "\r\6\0\0\0\246n\300\212l]e\300\305\240g\372\204\344e\300d\216\25\215\1ve@\357B\276"..., 101) = 101 fstat(6, {st_mode=S_IFREG|0664, st_size=814627, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3619) = 3619 write(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\23_\363\0\254\350T@\24_\363"..., 37) = 37 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3481) = 3481 read(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\24_\363\0\254\350T@\23_\363"..., 4096) = 175 lseek(6, 814664, SEEK_SET) = 814664 lseek(6, 814664, SEEK_SET) = 814664 lseek(6, 814664, SEEK_SET) = 814664 lseek(6, 814664, SEEK_SET) = 814664 lseek(6, 814664, SEEK_SET) = 814664 lseek(6, 814664, SEEK_SET) = 814664 lseek(6, 814664, SEEK_SET) = 814664 lseek(6, 814664, SEEK_SET) = 814664 lseek(6, -138, SEEK_CUR) = 814526 write(6, "\f", 1) = 1 fstat(6, {st_mode=S_IFREG|0664, st_size=814664, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3656) = 3656 write(6, "\r\6\0\0\0\246n\300\212l]e\300\305\240g\372\204\344e\300d\216\25\215\1ve@\357B\276"..., 101) = 101 fstat(6, {st_mode=S_IFREG|0664, st_size=814765, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3757) = 3757 write(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\24_\363\0\254\350T@\23_\363"..., 37) = 37 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3619) = 3619 read(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\23_\363\0\254\350T@\24_\363"..., 4096) = 175 lseek(6, 814802, SEEK_SET) = 814802 lseek(6, 814802, SEEK_SET) = 814802 lseek(6, 814802, SEEK_SET) = 814802 lseek(6, 814802, SEEK_SET) = 814802 lseek(6, -138, SEEK_CUR) = 814664 write(6, "\f", 1) = 1 fstat(6, {st_mode=S_IFREG|0664, st_size=814802, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3794) = 3794 write(6, "\r\6\0\0\0\246n\300\212l]e\300\305\240g\372\204\344e\300d\216\25\215\1ve@\357B\276"..., 101) = 101 fstat(6, {st_mode=S_IFREG|0664, st_size=814903, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3895) = 3895 write(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\23_\363\0\254\350T@\24_\363"..., 37) = 37 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3757) = 3757 read(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\24_\363\0\254\350T@\23_\363"..., 4096) = 175 lseek(6, 814940, SEEK_SET) = 814940 lseek(6, 814940, SEEK_SET) = 814940 lseek(6, 814940, SEEK_SET) = 814940 lseek(6, 814940, SEEK_SET) = 814940 lseek(6, 814940, SEEK_SET) = 814940 lseek(6, 814940, SEEK_SET) = 814940 lseek(6, 814940, SEEK_SET) = 814940 lseek(6, 814940, SEEK_SET) = 814940 lseek(6, -138, SEEK_CUR) = 814802 write(6, "\f", 1) = 1 fstat(6, {st_mode=S_IFREG|0664, st_size=814940, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3932) = 3932 write(6, "\r\6\0\0\0\246n\300\212l]e\300\305\240g\372\204\344e\300d\216\25\215\1ve@\357B\276"..., 101) = 101 fstat(6, {st_mode=S_IFREG|0664, st_size=815041, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 4033) = 4033 write(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\24_\363\0\254\350T@\23_\363"..., 37) = 37 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 3895) = 3895 read(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\23_\363\0\254\350T@\24_\363"..., 4096) = 175 lseek(6, 815078, SEEK_SET) = 815078 lseek(6, 815078, SEEK_SET) = 815078 lseek(6, 815078, SEEK_SET) = 815078 lseek(6, 815078, SEEK_SET) = 815078 lseek(6, -138, SEEK_CUR) = 814940 write(6, "\f", 1) = 1 fstat(6, {st_mode=S_IFREG|0664, st_size=815078, ...}) = 0 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 4070) = 4070 write(6, "\r\6\0\0\0\246n\300\212l]e\300\305\240g\372\204\344e\300d\216\25\215\1", 26) = 26 write(6, "ve@\357B\276I\231$e@\305\240g\372\204\344e\300\246n\300\212l]e\300\23_\363\0\254"..., 75) = 75 fstat(6, {st_mode=S_IFREG|0664, st_size=815179, ...}) = 0 lseek(6, 815104, SEEK_SET) = 815104 read(6, "ve@\357B\276I\231$e@\305\240g\372\204\344e\300\246n\300\212l]e\300\23_\363\0\254"..., 75) = 75 write(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\23_\363\0\254\350T@\24_\363"..., 37) = 37 lseek(6, 811008, SEEK_SET) = 811008 read(6, "l\fK\300&U@\304p\327\354y1U\300L\205\334i\322\27U\300\23_\363\0\254\350T@\r"..., 4033) = 4033 read(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\24_\363\0\254\350T@\23_\363"..., 4096) = 175 lseek(6, 815216, SEEK_SET) = 815216 lseek(6, 815216, SEEK_SET) = 815216 lseek(6, 815216, SEEK_SET) = 815216 lseek(6, 815216, SEEK_SET) = 815216 lseek(6, 815216, SEEK_SET) = 815216 lseek(6, 815216, SEEK_SET) = 815216 lseek(6, 815216, SEEK_SET) = 815216 lseek(6, 815216, SEEK_SET) = 815216 lseek(6, -138, SEEK_CUR) = 815078 write(6, "\f", 1) = 1 fstat(6, {st_mode=S_IFREG|0664, st_size=815216, ...}) = 0 lseek(6, 815104, SEEK_SET) = 815104 read(6, "ve@\357B\276I\231$e@\305\240g\372\204\344e\300\246n\300\212l]e\300\23_\363\0\254"..., 112) = 112 write(6, "\r\6\0\0\0\246n\300\212l]e\300\305\240g\372\204\344e\300d\216\25\215\1ve@\357B\276"..., 101) = 101 fstat(6, {st_mode=S_IFREG|0664, st_size=815317, ...}) = 0 lseek(6, 815104, SEEK_SET) = 815104 read(6, "ve@\357B\276I\231$e@\305\240g\372\204\344e\300\246n\300\212l]e\300\23_\363\0\254"..., 213) = 213 write(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\24_\363\0\254\350T@\23_\363"..., 37) = 37 lseek(6, 815104, SEEK_SET) = 815104 read(6, "ve@\357B\276I\231$e@\305\240g\372\204\344e\300\246n\300\212l]e\300\23_\363\0\254"..., 75) = 75 read(6, "\r\2\0\0\0\246n\300\212l]e\300\246n\300\212l]e\300\23_\363\0\254\350T@\24_\363"..., 4096) = 175 ...
I'm a bit clueless... do you have uncommitted improvements? :-)
Can anyone else please test this tiny GeoJSON in a latlong location? thanks.
comment:29 by , 7 years ago
Markus, I'm having exactly the same issue. Takes forever at that stage...
comment:30 by , 7 years ago
Takes a long time on my computer as well, gets stuck at breaking boundaries. I am running 7.3.svn r70733 on Ubuntu 16.04. I don't normally have problems with importing much larger shapefiles.
comment:31 by , 7 years ago
Can you try this patch? The issue was that Vect_line_intersection would create breaks on first and last line vertices because there was no tolerance for comparing points. There are more such comparisons (potential issue?)...
comment:33 by , 7 years ago
In addition, here my compiler settings, maybe that makes a difference?
sh -x conf_grass7.sh + INTEL='-march=native -std=gnu99 -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector -m64' + MYGCC=-fdiagnostics-color + MYCFLAGS='-Wall -fopenmp -lgomp -Ofast -fno-fast-math -march=core-avx-i -fno-common -fexceptions -march=native -std=gnu99 -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector -m64 MYGCC' + MYLDFLAGS='-Wl,--no-undefined -fopenmp -lgomp' + MYCXXFLAGS=-Ofast + CC=clang + CXX=clang++ + CFLAGS=-O2 + CXXFLAGS=-O2 + LDFLAGS=-s + ./configure --with-cxx --enable-largefile --with-proj --with-proj-share=/usr/share/proj --with-gdal=/usr/bin/gdal-config --with-python --with-geos --with-liblas --with-sqlite --with-nls --with-blas --with-blas-includes=/usr/include/atlas-x86_64-base/ --with-lapack --with-lapack-includes=/usr/include/atlas-x86_64-base/ --with-cairo --with-cairo-ldflags=-lfontconfig --with-freetype --with-freetype-includes=/usr/include/freetype2 --with-wxwidgets=/usr/bin/wx-config --with-fftw --with-motif --with-postgres --with-netcdf --without-mysql --without-odbc --without-openmp --without-ffmpeg + tee config_log.txt checking host system type... x86_64-pc-linux-gnu checking for gcc... clang checking whether the C compiler (clang -O2 -s) works... yes checking whether the C compiler (clang -O2 -s) is a cross-compiler... no checking whether we are using GNU C... yes ... # CPU: cat /proc/cpuinfo | grep 'model name' model name : Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz ...
I'll switch to gcc and test again.
follow-up: 36 comment:34 by , 7 years ago
Wow. After full recompilation, with gcc it does NOT "hang":
## gcc production INTEL="-march=native" MYGCC="-fdiagnostics-color" MYCFLAGS="-O2 $INTEL $MYGCC" MYCXXFLAGS="-O2" MYLDFLAGS="-Wl,--no-undefined" MYLDFLAGS="-s $MYLDFLAGS" # clang #CC=clang CXX=clang++ CFLAGS="-O2" CXXFLAGS="-O2" LDFLAGS="-s" ./configure \ # gcc LDFLAGS="$MYLDFLAGS" CFLAGS="$MYCFLAGS" CXXFLAGS="$MYCXXFLAGS" ./configure \ --with-cxx \ --enable-largefile \ ...
Now:
GRASS 7.2.1svn (latlong_wgs84):~ > time -p v.in.ogr in=world_AOI_latlong.geojson out=world_AOI_latlong --o ... ----------------------------------------------------- 1 input polygons Total area: 2.37675E+13 (2 areas) Area without category: 2.04916E+11 (1 areas) ----------------------------------------------------- Copying features... 100% Building topology for vector map <world_AOI_latlong@user1>... Registering primitives... 3 primitives registered 19 vertices registered Building areas... 100% 2 areas built One isle built Attaching islands... 100% Attaching centroids... 100% Number of nodes: 1 Number of primitives: 3 Number of points: 0 Number of lines: 0 Number of boundaries: 2 Number of centroids: 1 Number of areas: 2 Number of isles: 1 real 0.09 user 0.04 sys 0.01
So, unfortunate clang compiler settings before or gcc-is-better-than-clang here?
follow-up: 39 comment:35 by , 7 years ago
Markus, you mean it does not hang with or without my patch?
comment:36 by , 7 years ago
Replying to neteler:
Wow. After full recompilation, with gcc it does NOT "hang":
[...]
So, unfortunate clang compiler settings before or gcc-is-better-than-clang here?
No, the difference is -O2 which you used with gcc but not with clang. I could reproduce the problem by not using any optimization.
The problem is that Vect_break_lines() enters an infinite loop because it continuously generates two new lines that are again intersecting each other, always at the same locations.
The attached patch (hcho) helps, but as mentioned in comment:31 there are more comparisons and it does not seem to be a good idea to allow tolerance at some places but not at other places when comparing a cross with a vertex.
What also helps is using Vect_line_intersection2() instead of Vect_line_intersection(). Interestingly, Vect_line_intersection2() does not need the patch, even though that part of the code is identical.
follow-ups: 38 40 comment:37 by , 7 years ago
Yes, that's what I found too. Vect_line_intersection2 doesn't have this issue, but it still creates a single point intersection at the first vertex of the second new line (line ID 3) in the geojson example.
Line ID 1: original unbroken line
1st iteration Line ID 2-4: new broken lines
2nd iteration Line ID 5: identical to line 3 Line ID 6: start node of line 3
Lines 5 & 6 shouldn't be returned at all from the intersection routine, I think. The patch fixes this.
comment:38 by , 7 years ago
Replying to hcho:
Yes, that's what I found too. Vect_line_intersection2 doesn't have this issue, but it still creates a single point intersection at the first vertex of the second new line (line ID 3) in the geojson example.
Line ID 1: original unbroken line
1st iteration Line ID 2-4: new broken lines
2nd iteration Line ID 5: identical to line 3 Line ID 6: start node of line 3
Lines 5 & 6 shouldn't be returned at all from the intersection routine, I think. The patch fixes this.
I found another issue in Vect_segment_intersection(): the order of the segments matters, i.e. the intersection point of a with b can be slightly different from the intersection point of b with a. The second attached patch fixes that and also avoids that infinite loop. I opt to apply both patches.
comment:39 by , 7 years ago
Replying to hcho:
Markus, you mean it does not hang with or without my patch?
Due to lack of time I tried without your patch so far.
comment:40 by , 7 years ago
Replying to hcho:
Yes, that's what I found too. Vect_line_intersection2 doesn't have this issue, but it still creates a single point intersection at the first vertex of the second new line (line ID 3) in the geojson example.
Line ID 1: original unbroken line
1st iteration Line ID 2-4: new broken lines
2nd iteration Line ID 5: identical to line 3 Line ID 6: start node of line 3
Lines 5 & 6 shouldn't be returned at all from the intersection routine, I think. The patch fixes this.
Such an infinite loop in Vect_break_lines() has been observed previously and fixed back then in trunk r55796, r55813, r55848. Apparently, these commits did not solve all issues. It is most important that the intersection point of the same two segments is always the same, no matter which segment is a and which is b.
Your patch might cause problems in special cases when an intersection point is not added to a line even if there are no vertices identical with the intersection point. That is, the line was not broken if it should have been. Therefore I would rather not apply your patch.
comment:41 by , 7 years ago
Milestone: | 7.0.6 → 7.0.7 |
---|
comment:43 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Replying to martinl:
What is state of this ticket?
The original issue was caused by a buggy NFS version which causes sqlite to be very slow. This won't fix in GRASS, the solution is to use a bug-free NFS version.
The second issue with an infinite loop when breaking lines has been fixed in trunk and relbr76.
Running under
perf
several times, and using-Og -g
builds of grass7 and gdal, it seems that somewhere between 30-40% of the execution time is happening in Python or the shell. That seems odd, but I'm ignorant...Additionally, it seems that grass is sending a query for every write to the DB, synchronously, instead of either doing the DB IO in another thread. That seems a poor use of multicored systems.