Opened 8 years ago

Closed 8 years ago

#3742 closed defect (fixed)

mvt regress failure on debbie and winnie

Reported by: robe Owned by: Björn Harrtell
Priority: blocker Milestone: PostGIS 2.4.0
Component: postgis Version: master
Keywords: Cc:

Description

Both debbie and winnie are showing this regress error on mvt regress

 mvt .. failed (diff expected obtained: /projects/postgis/tmp/2.4.0dev_pg9.5_geos3.6.1_gdal2.1.3w64/test_120_diff)
-----------------------------------------------------------------------------
--- mvt_expected	2017-04-21 13:42:19 -0400
+++ /projects/postgis/tmp/2.4.0dev_pg9.5_geos3.6.1_gdal2.1.3w64/test_120_out	2017-04-21 15:56:31 -0400
@@ -25,8 +25,8 @@
 TA6|GisKBHRlc3QSDhIEAAABARgBIgQJMt4/GgJjMRoCYzIiAigBIgIwASiAIHgC
 TA7|Gk4KBHRlc3QSDBICAAAYASIECTTcPxIMEgIAARgBIgQJMt4/EgwSAgABGAEiBAk03D8aAmMxIgsK
 CW90aGVydGVzdCIGCgR0ZXN0KIAgeAI=
-TA8|GkUKBHRlc3QSDBICAAAYASIECTLePxIMEgIAARgBIgQJNNw/EgwSAgACGAEiBAk03D8aAmMxIgIo
-ASICKAEiAigCKIAgeAI=
+TA8|GkEKBHRlc3QSDBICAAAYASIECTLePxIMEgIAABgBIgQJNNw/EgwSAgABGAEiBAk03D8aAmMxIgIo
+ASICKAIogCB4Ag==
 TU2
 ERROR:  pgis_asmvt_transfn: parameter row cannot be other than a rowtype
 TU3
-----------------------------------------------------------------------------

Seems to have started happening as result of r15358 commit

Commit c11837ee6a26f6e42c33a96fe8ae019da93986cc by bjorn

Expand tests to verify redundant value encoding fix Closes #3741
git-svn-id: http://svn.osgeo.org/postgis/trunk@15359
b70326c6-7e19-0410-871a-916f4a2858ee

Change History (19)

comment:1 by robe, 8 years ago

On closer inspection doesn't seem to be for all versions of PostgreSQL.

On debbie - seems like only 9.3 failed. I'm going to run again just to confirm:

 mvt .. failed (diff expected obtained: /var/lib/jenkins/workspace/postgis/tmp/2_4_pg9.3w64/test_113_diff)
-----------------------------------------------------------------------------
--- mvt_expected	2017-04-21 17:41:38.547468171 +0000
+++ /var/lib/jenkins/workspace/postgis/tmp/2_4_pg9.3w64/test_113_out	2017-04-21 18:00:18.839535683 +0000
@@ -25,8 +25,8 @@
 TA6|GisKBHRlc3QSDhIEAAABARgBIgQJMt4/GgJjMRoCYzIiAigBIgIwASiAIHgC
 TA7|Gk4KBHRlc3QSDBICAAAYASIECTTcPxIMEgIAARgBIgQJMt4/EgwSAgABGAEiBAk03D8aAmMxIgsK
 CW90aGVydGVzdCIGCgR0ZXN0KIAgeAI=
-TA8|GkUKBHRlc3QSDBICAAAYASIECTLePxIMEgIAARgBIgQJNNw/EgwSAgACGAEiBAk03D8aAmMxIgIo
-ASICKAEiAigCKIAgeAI=
+TA8|GkEKBHRlc3QSDBICAAAYASIECTLePxIMEgIAABgBIgQJNNw/EgwSAgABGAEiBAk03D8aAmMxIgIo
+ASICKAIogCB4Ag==
 TU2
 ERROR:  pgis_asmvt_transfn: parameter row cannot be other than a rowtype
 TU3
-----------------------------------------------------------------------------

comment:2 by robe, 8 years ago

okay just ran without 9.3 debbie and all passed. So for debbie it's just 9.3. For winnie 9.6 is okay but then 9.5 and 9.4 fail.

comment:3 by robe, 8 years ago

forgot to add that debbie runs 64-bit 9.6, 9.5, 9.4, and 9.3

comment:4 by Björn Harrtell, 8 years ago

This will surely prove to have an interesting explanation. :)

comment:5 by robe, 8 years ago

well I think debbie's 9.6 and 9.5 I read from head weekly. winnie I'm a bit lazy about updating her PostgreSQL. It could possibly be an issue with micro fixes. I'll upgrade debbie's 9.3 to latest micro and see if the issue goes away.

comment:6 by robe, 8 years ago

Sadly issue did not go away after the upgrade:

For the record before I did 9.3 upgrade debbie was running 9.3.10 (dronie is running PostgreSQL 9.3.14 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bit and seemed to breeze thru mvt without a problem).

Now she's running:

PostgreSQL 9.3.16 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 5.2.1-22) 5.2.1 20151010, 64-bit

and still get

-----------------------------------------------------------------------------
--- mvt_expected	2017-04-21 17:41:38.547468171 +0000
+++ /var/lib/jenkins/workspace/postgis/tmp/2_4_pg9.3w64/test_113_out	2017-04-23 16:28:57.204235634 +0000
@@ -25,8 +25,8 @@
 TA6|GisKBHRlc3QSDhIEAAABARgBIgQJMt4/GgJjMRoCYzIiAigBIgIwASiAIHgC
 TA7|Gk4KBHRlc3QSDBICAAAYASIECTTcPxIMEgIAARgBIgQJMt4/EgwSAgABGAEiBAk03D8aAmMxIgsK
 CW90aGVydGVzdCIGCgR0ZXN0KIAgeAI=
-TA8|GkUKBHRlc3QSDBICAAAYASIECTLePxIMEgIAARgBIgQJNNw/EgwSAgACGAEiBAk03D8aAmMxIgIo
-ASICKAEiAigCKIAgeAI=
+TA8|GkEKBHRlc3QSDBICAAAYASIECTLePxIMEgIAABgBIgQJNNw/EgwSAgABGAEiBAk03D8aAmMxIgIo
+ASICKAIogCB4Ag==
 TU2
 ERROR:  pgis_asmvt_transfn: parameter row cannot be other than a rowtype
 TU3
-----------------------------------------------------------------------------

seems like cassert isn't the issue since I would expect 9.4, 9.5, and 9.6 to fail too. This will be very very interesting.

You see anything special about the output that might give a clue.

comment:7 by robe, 8 years ago

The good news is mvt is doing okay against debbie's PostgreSQL 10 runs too :)

comment:8 by robe, 8 years ago

okay I upgraded debbie's 9.4 as well which was at 9.4.5 before and was passing. still passes after the upgrade to

PostgreSQL 9.4.11 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 5.2.1-22) 5.2.1 20151010, 64-bit

since I used the same script to compile both the new 9.3 and 9.4, I think that rules out how I'm compiling and all that seems to also rule out microversion.

But here's another interesting twist. I just compiled postgis on my windows 7 64-bit against 9.6, and it fails mvt test as well same difference.

Creating database 'postgis_reg'
Loading PostGIS into 'postgis_reg'
Loading SFCGAL into 'postgis_reg'
PostgreSQL 9.6.1, compiled by Visual C++ build 1800, 64-bit
  Postgis 2.4.0dev - r15360 - 2017-04-23 17:41:51
  scripts 2.4.0dev r15360
  GEOS: 3.6.1-CAPI-1.10.1 r4317
  PROJ: Rel. 4.9.1, 04 March 2015
  SFCGAL: 1.3.0


+++ /projects/postgis/tmp/2.4_pg9.6w64/test_120_out	2017-04-23 13:45:34.273093800 -0400
@@ -25,8 +25,8 @@
 TA6|GisKBHRlc3QSDhIEAAABARgBIgQJMt4/GgJjMRoCYzIiAigBIgIwASiAIHgC
 TA7|Gk4KBHRlc3QSDBICAAAYASIECTTcPxIMEgIAARgBIgQJMt4/EgwSAgABGAEiBAk03D8aAmMxIgsK
 CW90aGVydGVzdCIGCgR0ZXN0KIAgeAI=
-TA8|GkUKBHRlc3QSDBICAAAYASIECTLePxIMEgIAARgBIgQJNNw/EgwSAgACGAEiBAk03D8aAmMxIgIo
-ASICKAEiAigCKIAgeAI=
+TA8|GkEKBHRlc3QSDBICAAAYASIECTLePxIMEgIAABgBIgQJNNw/EgwSAgABGAEiBAk03D8aAmMxIgIo
+ASICKAIogCB4Ag==
 TU2
 ERROR:  pgis_asmvt_transfn: parameter row cannot be other than a rowtype
 TU3

winnie's 9.6 64-bit run:

PostgreSQL 9.6.1, compiled by Visual C++ build 1800, 64-bit
  Postgis 2.4.0dev - r15360 - 2017-04-23 17:53:51
  scripts 2.4.0dev r15360
  GEOS: 3.6.1-CAPI-1.10.1 r4317
  PROJ: Rel. 4.9.1, 04 March 2015
  SFCGAL: 1.3.0

Running tests

:
mvt .. ok 
geobuf .. ok 

So clearly PostgreSQL version (even micro) does not seem to be issue. Why winnie's 9.6 passes and mine fails — when we are setup mostly the same is very puzzling.

I'm suspecting mischievous spirits haunting things :)

comment:9 by Björn Harrtell, 8 years ago

Thanks for investigating, very valuable as I still haven't mustered up the guts to make a move on Debbie (or Winnie, I'm not that picky).

Decoding the output vector tile and compare the expected to the failing in detail will perhaps give some more info. As of yet I've only done naive inspection of decoded vector tiles using vector-tile-js but I've just found a more detailed way to inspect using protoc-c —decode.

The failing test case is:

SELECT 'TA8', encode(ST_AsMVT('test', 4096, 'geom', q), 'base64') FROM (
    SELECT 1 AS c1, ST_AsMVTGeom(ST_GeomFromText('POINT(25 17)'),
    ST_MakeBox2D(ST_Point(0, 0), ST_Point(4096, 4096)), 4096, 0, false) AS geom
    UNION
    SELECT 1 AS c1, ST_AsMVTGeom(ST_GeomFromText('POINT(26 18)'),
    ST_MakeBox2D(ST_Point(0, 0), ST_Point(4096, 4096)), 4096, 0, false) AS geom
    UNION
    SELECT 2 AS c1, ST_AsMVTGeom(ST_GeomFromText('POINT(26 18)'),
    ST_MakeBox2D(ST_Point(0, 0), ST_Point(4096, 4096)), 4096, 0, false) AS geom) AS q;

It creates a tile with three point features. Duplicate values for c1 should only be encoded once and reused.

Decode of the expected binary:

layers {
  name: "test"
  features {
    tags: 0
    tags: 0
    type: POINT
    geometry: 9
    geometry: 50
    geometry: 8158
  }
  features {
    tags: 0
    tags: 1
    type: POINT
    geometry: 9
    geometry: 52
    geometry: 8156
  }
  features {
    tags: 0
    tags: 2
    type: POINT
    geometry: 9
    geometry: 52
    geometry: 8156
  }
  keys: "c1"
  values {
    uint_value: 1
  }
  values {
    uint_value: 1
  }
  values {
    uint_value: 2
  }
  extent: 4096
  version: 2
}

Decode of the unexpected binary:

layers {
  name: "test"
  features {
    tags: 0
    tags: 0
    type: POINT
    geometry: 9
    geometry: 50
    geometry: 8158
  }
  features {
    tags: 0
    tags: 0
    type: POINT
    geometry: 9
    geometry: 52
    geometry: 8156
  }
  features {
    tags: 0
    tags: 1
    type: POINT
    geometry: 9
    geometry: 52
    geometry: 8156
  }
  keys: "c1"
  values {
    uint_value: 1
  }
  values {
    uint_value: 2
  }
  extent: 4096
  version: 2
}

On first glance it seems that the expected output is incorrectly containing redundant encoding of the c1 value. Both are valid tiles but the unexpected encode is more correct. I think I might have a lead on the problem here. I fixed it for string values recently but it appears I have similar issues with numeric values. Will look for a solution when time permits.

comment:10 by robe, 8 years ago

So does that mean the failing ones are the right behavior, not the wrong behavior?

comment:11 by robe, 8 years ago

I took out 9.3 testing for 2.4 for now on debbie until we resolve this. I'll keep winnie broken so that she catches when it's fixed.

comment:12 by Björn Harrtell, 8 years ago

Both outputs are valid tiles and will render the same (which is why I did not see anything wrong at first) but the unexpected output is the most correct encoding.

It must be related to the use of uthash (1) where I must be doing something wrong. Still can't explain at all though why it's affected by environment details.

1) http://troydhanson.github.io/uthash/

comment:13 by Björn Harrtell, 8 years ago

When I run the TA8 test case manually in pgadmin I get: "GkEKBHRlc3QSDBICAAAYASIECTLePxIMEgIAABgBIgQJNNw/EgwSAgABGAEiBAk03D8aAmMxIgIo ASICKAIogCB4Ag==" (which is the correct encoding)

When I run the test case using run_test.pl or make check I get: "GkUKBHRlc3QSDBICAAAYASIECTLePxIMEgIAARgBIgQJNNw/EgwSAgACGAEiBAk03D8aAmMxIgIo ASICKAEiAigCKIAgeAI=" (which is the incorrect encoding)

It does not matter if I reorder the tests so that it tries this case before any other.

comment:14 by Björn Harrtell, 8 years ago

There be dragons in this one. Running the TA8 test case in pgadmin at my home computer (previous run was on work laptop, both systems at Debian Stretch) results in… "GkUKBHRlc3QSDBICAAAYASIECTLePxIMEgIAARgBIgQJNNw/EgwSAgACGAEiBAk03D8aAmMxIgIo ASICKAEiAigCKIAgeAI=", the incorrect encoding.

comment:15 by strk, 8 years ago

Valgrind is a great tool to fight dragons

comment:16 by Björn Harrtell, 8 years ago

But it needs cunit-testable code right? i.e no postgresql dependencies?

comment:17 by Björn Harrtell, 8 years ago

If I set debug message levels to the following:

  • SET client_min_messages TO DEBUG2; - I get incorrect encoding
  • SET client_min_messages TO DEBUG3; - I get correct encoding
Last edited 8 years ago by Björn Harrtell (previous) (diff)

comment:18 by robe, 8 years ago

I think you can use valgrind with postgres, it's just a tad noisier so I hear.

comment:19 by Björn Harrtell, 8 years ago

Resolution: fixed
Status: assignedclosed

In 15372:

Cast smaller integer types to the destination type
Fixes #3741 and #3742

Note: See TracTickets for help on using tickets.