Opened 6 years ago

Closed 5 years ago

#3700 closed defect (fixed)

test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs, and travis sometimes

Reported by: robe Owned by: komzpa
Priority: high Milestone: PostGIS 2.4.4
Component: postgis Version: 2.3.x
Keywords: Cc:

Description

This is beginning to annoy me. I thought I had this in a ticket already but couldn't find it.

On occassion especially during high-load, winnie's 32-bit runs fail on this test:

  Test: test_kmeans ...Makefile:85: recipe for target `check' failed

It's always that test and when I think I've only seen the 32-bit runs fail. They fail about once every 3-5 runs.

Could be windows, or there is something wrong with kmeans that shows up more often on 32-bit systems.

Change History (23)

comment:1 by robe, 6 years ago

Component: buildbotspostgis
Owner: changed from robe to pramsey

comment:2 by strk, 6 years ago

Do you run with RUNTESTFLAGS=-v ? If there's no output but non-success return then maybe it's a missing "return" somewhere, leaving the return code to phase-of-the-moon matters

comment:3 by robe, 6 years ago

Yes she runs with RUNTTESTFLAGS=-v it looks like:

https://git.osgeo.org/gogs/postgis/postgis/src/svn-trunk/ci/winnie/regress_postgis.sh#L143

But that doesn't explain why it only fails on 32-bit and not 64-bit does it?

comment:4 by strk, 6 years ago

No, but the lack of diff output suggests to me that there's no difference between expected and obtained output, thus the error must be in the run_test script itself. Can you try to run it in isolation on that machine, against the specific offending testcase ?

comment:5 by robe, 6 years ago

This is in cunit. I thought run_test is just for the SQL.

comment:6 by robe, 6 years ago

Summary: test_kmeans fails on winnie often on 32-bit runstest_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs

Okay this just happened on winnie's 64-bit trunk run so guess not limited to 32-bit. This is the first time I recall it happening on 64-bit.

 Test: test_kmeans ...Makefile:85: recipe for target `check' failed
make[2]: *** [check] Error 255
make[2]: Leaving directory `/projects/postgis/branches/2.4/liblwgeom/cunit'
Makefile:205: recipe for target `check' failed
make[1]: *** [check] Error 2
make[1]: Leaving directory `/projects/postgis/branches/2.4/liblwgeom'
GNUmakefile:16: recipe for target `check' failed
make: *** [check] Error 1

comment:7 by dbaston, 5 years ago

Milestone: PostGIS 2.4.0PostGIS 2.5.0
Priority: mediumhigh

comment:8 by pramsey, 5 years ago

Resolution: worksforme
Status: newclosed

I just ran the cunit tests on 4 cores simultaneously in a big loop looking for this failure, but didn't get it. Maybe it's gone? ha ha.

comment:9 by robe, 5 years ago

Resolution: worksforme
Status: closedreopened

Nice try buddy. Keeping this for 2.5. I actually haven't been testing 32-bit for a while cause I have a more pressing issue with it failing on shp2pgsql-gui that I haven't figured out. So I turned off testing on 32-bit until I've squared that away.

Anyway like I said I think I've only seen this on windows, so it might have to do with the fact I compile with mingw and test against a VC++ build that it's seeing something you aren't. I'll reassign to myself and try to nail down the issue in 2.5.

comment:10 by robe, 5 years ago

Owner: changed from pramsey to robe
Status: reopenednew

comment:11 by robe, 5 years ago

hah guess it's still a problem. Just happened to me when testing r15671 on my mingw gcc 4.8.3 64-bit. though error is a little different so perhaps not quite the same thing.

     CUnit - A unit testing framework for C - Version 2.1-2
     http://cunit.sourceforge.net/


Suite: computational_geometry
  Test: test_lw_segment_side ...passed
  Test: test_lw_segment_intersects ...passed
  Test: test_lwline_crossing_short_lines ...passed
  Test: test_lwline_crossing_long_lines ...passed
  Test: test_lwline_crossing_bugs ...passed
  Test: test_lwpoint_set_ordinate ...passed
  Test: test_lwpoint_get_ordinate ...passed
  Test: test_point_interpolate ...passed
  Test: test_lwline_clip ...passed
  Test: test_lwline_clip_big ...passed
  Test: test_lwmline_clip ...passed
  Test: test_geohash_point ...passed
  Test: test_geohash_precision ...passed
  Test: test_geohash ...passed
  Test: test_geohash_point_as_int ...passed
  Test: test_isclosed ...passed
  Test: test_lwgeom_simplify ...passed
  Test: test_lw_arc_center ...passed
  Test: test_point_density ...passed
  Test: test_kmeans ...Makefile:86: recipe for target 'check' failed
make[2]: *** [check] Segmentation fault
make[2]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom/cunit'
Makefile:205: recipe for target 'check' failed
make[1]: *** [check] Error 2
make[1]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom'
GNUmakefile:16: recipe for target 'check' failed
make: *** [check] Error 1

I should add it's not repeatable. I did another make check exactly the same way and it was fine this time around. This is pure mingw gcc 4.8.3 64-bit compiled PostgreSQL 10 with cassert on. No VC++ in mix since EDB hasn't come out with PostgreSQL 10 for me to test with anyrate when it fails its in the cunit layer so that shouldn't have anything to do with it anyway.

Last edited 5 years ago by robe (previous) (diff)

comment:12 by robe, 5 years ago

damn I wish this happened consistently. I got the error again but then can't repeat it trying 4 times after. true Heisenberg. I'll try throwing in some debug notices to see if I can at least catch where it's happening.

comment:13 by robe, 5 years ago

Milestone: PostGIS 2.5.0PostGIS 2.4.1

still failing randomly usually on 32-bit runs.

comment:14 by pramsey, 5 years ago

Milestone: PostGIS 2.4.1PostGIS 2.4.2

comment:15 by pramsey, 5 years ago

Milestone: PostGIS 2.4.2PostGIS 2.4.3

comment:16 by robe, 5 years ago

Summary: test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runstest_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs, and travis sometimes

Yeh travis crashed on kmeans test as well recently (not just all in winnie's head something is fishy in these mean waters)

https://travis-ci.org/postgis/postgis/jobs/322217076

This was run against trunk r16189

PostGIS is now configured for x86_64-unknown-linux-gnu
 -------------- Compiler Info ------------- 
  C compiler:           gcc -O3 -march=native -mtune=native
  SQL preprocessor:     /usr/bin/cpp -traditional-cpp -w -P
 -------------- Additional Info ------------- 
  Interrupt Tests:   DISABLED use: --with-interrupt-tests to enable
 -------------- Dependencies -------------- 
  GEOS config:          /usr/bin/geos-config
  GEOS version:         3.5.0
  GDAL config:          /usr/bin/gdal-config
  GDAL version:         2.2.2
  SFCGAL config:        /usr/bin/sfcgal-config
  SFCGAL version:       1.2.2
  PostgreSQL config:    /usr/lib/postgresql/9.6/bin/pg_config
  PostgreSQL version:   PostgreSQL 9.6.6
  PROJ4 version:        49
  Libxml2 config:       /usr/bin/xml2-config
  Libxml2 version:      2.9.1
  JSON-C support:       yes
  protobuf-c support:   no
  PCRE support:         yes
  Perl:                 /usr/bin/perl
 --------------- Extensions --------------- 
  PostGIS Raster:       enabled
  PostGIS Topology:     enabled
  SFCGAL support:       enabled
  Address Standardizer support:       enabled
 -------- Documentation Generation -------- 
  xsltproc:             /usr/bin/xsltproc
  xsl style sheets:     /usr/share/xml/docbook/stylesheet/docbook-xsl
  dblatex:              /usr/bin/dblatex
  convert:              /usr/bin/convert
  mathml2.dtd:          /usr/share/xml/schema/w3c/mathml/dtd/mathml2.dtd
Test: test_kmeans ...make[2]: *** [check] Illegal instruction (core dumped)
make[2]: Leaving directory `/home/travis/build/postgis/postgis/liblwgeom/cunit'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/travis/build/postgis/postgis/liblwgeom'
make: *** [check] Error 1

comment:17 by komzpa, 5 years ago

How about we adopt logbt for cunit as non-temporary measure?

It will just print backtrace for anything running under it if it dumps core.

I've used it like this (full path to cunit was also needed): https://github.com/postgis/postgis/pull/176/commits/f2f06a11572bb25168fe375c9236d3b351f4607e

Likely much more templating is needed to detect presence of logbt and run under it if it's there.

comment:18 by robe, 5 years ago

yah that would be great. Not sure how to move forward with that.

BTW winnie's 64-bit on 2.5.0 failed

  Test: test_kmeans ...Makefile:86: recipe for target 'check' failed
make[2]: *** [check] Segmentation fault
make[2]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom/cunit'
Makefile:207: recipe for target 'check' failed
make[1]: *** [check] Error 2
make[1]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom'
GNUmakefile:16: recipe for target 'check' failed
make: *** [check] Error 1

comment:19 by komzpa, 5 years ago

logbt enabled on travis. If it ever reproduces there it will be logged, although likely reason for Illegal Instruction failure was due to -march=native and travis faking CPU ID.

comment:20 by robe, 5 years ago

Milestone: PostGIS 2.4.3PostGIS 2.4.4

after all your changes this might not be an issue anymore, but I'll keep it open until we confirm.

comment:21 by pramsey, 5 years ago

Couple weeks, looking OK, @robe?

comment:22 by komzpa, 5 years ago

Owner: changed from robe to komzpa
Status: newassigned

comment:23 by komzpa, 5 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.