Opened 12 months ago

Last modified 12 months ago

#4009 new defect

'where' option in v.to.rast can select wrong feature for raster attribute, in areas where features overlap

Reported by: florisvdh Owned by: grass-dev@…
Priority: normal Milestone: 7.8.3
Component: Vector Version: svn-releasebranch76
Keywords: v.to.rast sql where Cc:
CPU: x86-64 Platform: Linux

Description

I use GRASS 7.6.1 on Linux Mint 18.1, i.e. Ubuntu Xenial (16.04) based. (Note, this is my first post here, I'm a beginning GRASS user, mostly working with R)

I applied something like:

v.to.rast input=polygon_layer output=output_x1 where="field1 LIKE 'x1'" \
         use=attr attribute_column=field2 memory=800 --overwrite

Importantly, field1 in polygon_layer has several possible values such as x1, x2, x3 and so on (81 different values in my usecase).

Moreover, in my usecase several features (polygons) have identical geometry, i.e. many sets of 2 or more polygons exist with 100% overlap among their own polygons (i.e. identical polygons), while each of these polygons has its own specific attributes: different values of field1 and so on. The problem that I met occurs for those features; possibly the same problem will occur for overlapping areas between polygons in general.

While the where option in v.to.rast is effective in localizing the correct areas, i.e. where field1 is x1, the problem is that the attribute value (field2) may come from one of the other (overlapping) features at that place, e.g. where field1 is x2. Which feature of an overlapping set is used for the attribute probably depends on the order of those overlapping features.

From what I've seen, it appears that v.to.rast will select the field2 value from the same feature regardless of the value for field1 in the above where option, as long as one of those overlapping features meets the where condition.

If this effectively is a bug in the program, maybe it happens in other modules as well, where the where option is available.

Note: I should also mention that I used this in a simple parallelization approach in a loop (setting field1 to different values), by starting the commands as background processes until a certain number of them is running (following an example I found in the GRASS wiki). That's why the memory option was used explicitly.

Currently I worked around this problem by splitting polygon_layer according to the value of field1, using v.extract (using just a simple loop here as the simple parallelization doesn't work with this one).

Attachments (1)

polygon_layer_debug.gpkg (236.0 KB) - added by florisvdh 12 months ago.
Clipped the original polygon_layer for a small region, for debugging

Download all attachments as: .zip

Change History (9)

comment:1 in reply to:  description Changed 12 months ago by mlennert

Replying to florisvdh:

I use GRASS 7.6.1 on Linux Mint 18.1, i.e. Ubuntu Xenial (16.04) based. (Note, this is my first post here, I'm a beginning GRASS user, mostly working with R)

I applied something like:

v.to.rast input=polygon_layer output=output_x1 where="field1 LIKE 'x1'" \
         use=attr attribute_column=field2 memory=800 --overwrite

Importantly, field1 in polygon_layer has several possible values such as x1, x2, x3 and so on (81 different values in my usecase).

Moreover, in my usecase several features (polygons) have identical geometry, i.e. many sets of 2 or more polygons exist with 100% overlap among their own polygons (i.e. identical polygons), while each of these polygons has its own specific attributes: different values of field1 and so on. The problem that I met occurs for those features; possibly the same problem will occur for overlapping areas between polygons in general.

While the where option in v.to.rast is effective in localizing the correct areas, i.e. where field1 is x1, the problem is that the attribute value (field2) may come from one of the other (overlapping) features at that place, e.g. where field1 is x2. Which feature of an overlapping set is used for the attribute probably depends on the order of those overlapping features.

From what I've seen, it appears that v.to.rast will select the field2 value from the same feature regardless of the value for field1 in the above where option, as long as one of those overlapping features meets the where condition.

If this effectively is a bug in the program, maybe it happens in other modules as well, where the where option is available.

This definitely should not happen. Could you provide the output of:

v.db.select map=polygon_layer where="field1 LIKE 'x1'" \
          column=cat,field1,field2

?

comment:2 Changed 12 months ago by florisvdh

Below are a few characteristics of the map. Note, the actual values of field 1 are not 'x1', 'x2' etc. but specific codes. Given the size of the data, I truncated the output.

$ v.info polygon_layer -c
Displaying column types/names for database connection of layer <1>:
INTEGER|cat
TEXT|field1
DOUBLE PRECISION|field2

$ v.db.select map=polygon_layer | wc -l
85051

$ v.db.select map=polygon_layer where="field1 LIKE 'rbbhc'" | wc -l
4720

$ v.db.select map=polygon_layer where="field1 LIKE 'rbbhc'" | head
cat|field1|field2
132|rbbhc|20
146|rbbhc|70
151|rbbhc|30
152|rbbhc|30
153|rbbhc|30
154|rbbhc|30
155|rbbhc|30
156|rbbhc|30
157|rbbhc|30

comment:3 Changed 12 months ago by neteler

Milestone: 7.8.3

comment:4 Changed 12 months ago by mmetz

I think I have a fix for this bug, but I would like to test the fix first with your data. Can you provide a small spatial extract of your data that can be used to reproduce the problem? Thanks!

Changed 12 months ago by florisvdh

Attachment: polygon_layer_debug.gpkg added

Clipped the original polygon_layer for a small region, for debugging

comment:5 Changed 12 months ago by florisvdh

@mmetz: see the attachment.

A few examples of sets of polygons which have the same polygon but different attributes:

  • cat 21936 and 21937
  • cat 21829 and 21830
  • cat 22114 and 22115

The original data set from which it was derived (involved more steps apart from clipping), is at https://doi.org/10.5281/zenodo.3540740 .

Some additional output to show region and a few steps:

$ g.region -p
projection: 99 (Belge 1972 / Belgian Lambert 72)
zone:       0
datum:      bel72
ellipsoid:  international
north:      244030.1
south:      153054.1
west:       22029.6
east:       258861.6
nsres:      32
ewres:      32
rows:       2843
cols:       7401
cells:      21041043

$ g.region w=196065 e=199009 s=188877 n=190989 save=debug

$ g.region -p
projection: 99 (Belge 1972 / Belgian Lambert 72)
zone:       0
datum:      bel72
ellipsoid:  international
north:      190989
south:      188877
west:       196065
east:       199009
nsres:      32
ewres:      32
rows:       66
cols:       92
cells:      6072

$ v.clip -r input=polygon_layer output=polygon_layer_debug

$ v.out.ogr input=polygon_layer_debug output=polygon_layer_debug.gpkg

comment:6 Changed 12 months ago by mmetz

Fixed in master b4f79f2 and relbr78 48e807e

comment:7 Changed 12 months ago by florisvdh

Interesting, thanks!

I wonder how this relates to ticket 1798, which initiated the 'where' option in quite a number of GRASS modules. Hence, are other modules affected as well?

comment:8 in reply to:  7 Changed 12 months ago by mmetz

Replying to florisvdh:

Interesting, thanks!

I wonder how this relates to ticket 1798, which initiated the 'where' option in quite a number of GRASS modules. Hence, are other modules affected as well?

All modules using Vect_cats_in_constraint() need to be checked, I am on it.

Note: See TracTickets for help on using tickets.