Opened 7 years ago

Closed 6 years ago

#3700 closed defect (fixed)

test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs, and travis sometimes

Reported by: robe Owned by: komzpa
Priority: high Milestone: PostGIS 2.4.4
Component: postgis Version: 2.3.x
Keywords: Cc:

Description

This is beginning to annoy me. I thought I had this in a ticket already but couldn't find it.

On occassion especially during high-load, winnie's 32-bit runs fail on this test:

  Test: test_kmeans ...Makefile:85: recipe for target `check' failed

It's always that test and when I think I've only seen the 32-bit runs fail. They fail about once every 3-5 runs.

Could be windows, or there is something wrong with kmeans that shows up more often on 32-bit systems.

Change History (23)

comment:1 by robe, 7 years ago

Component: buildbotspostgis
Owner: changed from robe to pramsey

comment:2 by strk, 7 years ago

Do you run with RUNTESTFLAGS=-v ? If there's no output but non-success return then maybe it's a missing "return" somewhere, leaving the return code to phase-of-the-moon matters

comment:3 by robe, 7 years ago

Yes she runs with RUNTTESTFLAGS=-v it looks like:

https://git.osgeo.org/gogs/postgis/postgis/src/svn-trunk/ci/winnie/regress_postgis.sh#L143

But that doesn't explain why it only fails on 32-bit and not 64-bit does it?

comment:4 by strk, 7 years ago

No, but the lack of diff output suggests to me that there's no difference between expected and obtained output, thus the error must be in the run_test script itself. Can you try to run it in isolation on that machine, against the specific offending testcase ?

comment:5 by robe, 7 years ago

This is in cunit. I thought run_test is just for the SQL.

comment:6 by robe, 7 years ago

Summary: test_kmeans fails on winnie often on 32-bit runstest_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs

Okay this just happened on winnie's 64-bit trunk run so guess not limited to 32-bit. This is the first time I recall it happening on 64-bit.

 Test: test_kmeans ...Makefile:85: recipe for target `check' failed
make[2]: *** [check] Error 255
make[2]: Leaving directory `/projects/postgis/branches/2.4/liblwgeom/cunit'
Makefile:205: recipe for target `check' failed
make[1]: *** [check] Error 2
make[1]: Leaving directory `/projects/postgis/branches/2.4/liblwgeom'
GNUmakefile:16: recipe for target `check' failed
make: *** [check] Error 1

comment:7 by dbaston, 7 years ago

Milestone: PostGIS 2.4.0PostGIS 2.5.0
Priority: mediumhigh

comment:8 by pramsey, 7 years ago

Resolution: worksforme
Status: newclosed

I just ran the cunit tests on 4 cores simultaneously in a big loop looking for this failure, but didn't get it. Maybe it's gone? ha ha.

comment:9 by robe, 7 years ago

Resolution: worksforme
Status: closedreopened

Nice try buddy. Keeping this for 2.5. I actually haven't been testing 32-bit for a while cause I have a more pressing issue with it failing on shp2pgsql-gui that I haven't figured out. So I turned off testing on 32-bit until I've squared that away.

Anyway like I said I think I've only seen this on windows, so it might have to do with the fact I compile with mingw and test against a VC++ build that it's seeing something you aren't. I'll reassign to myself and try to nail down the issue in 2.5.

comment:10 by robe, 7 years ago

Owner: changed from pramsey to robe
Status: reopenednew

comment:11 by robe, 7 years ago

hah guess it's still a problem. Just happened to me when testing r15671 on my mingw gcc 4.8.3 64-bit. though error is a little different so perhaps not quite the same thing.

     CUnit - A unit testing framework for C - Version 2.1-2
     http://cunit.sourceforge.net/


Suite: computational_geometry
  Test: test_lw_segment_side ...passed
  Test: test_lw_segment_intersects ...passed
  Test: test_lwline_crossing_short_lines ...passed
  Test: test_lwline_crossing_long_lines ...passed
  Test: test_lwline_crossing_bugs ...passed
  Test: test_lwpoint_set_ordinate ...passed
  Test: test_lwpoint_get_ordinate ...passed
  Test: test_point_interpolate ...passed
  Test: test_lwline_clip ...passed
  Test: test_lwline_clip_big ...passed
  Test: test_lwmline_clip ...passed
  Test: test_geohash_point ...passed
  Test: test_geohash_precision ...passed
  Test: test_geohash ...passed
  Test: test_geohash_point_as_int ...passed
  Test: test_isclosed ...passed
  Test: test_lwgeom_simplify ...passed
  Test: test_lw_arc_center ...passed
  Test: test_point_density ...passed
  Test: test_kmeans ...Makefile:86: recipe for target 'check' failed
make[2]: *** [check] Segmentation fault
make[2]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom/cunit'
Makefile:205: recipe for target 'check' failed
make[1]: *** [check] Error 2
make[1]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom'
GNUmakefile:16: recipe for target 'check' failed
make: *** [check] Error 1

I should add it's not repeatable. I did another make check exactly the same way and it was fine this time around.

Version 1, edited 7 years ago by robe (previous) (next) (diff)

comment:12 by robe, 7 years ago

damn I wish this happened consistently. I got the error again but then can't repeat it trying 4 times after. true Heisenberg. I'll try throwing in some debug notices to see if I can at least catch where it's happening.

comment:13 by robe, 7 years ago

Milestone: PostGIS 2.5.0PostGIS 2.4.1

still failing randomly usually on 32-bit runs.

comment:14 by pramsey, 7 years ago

Milestone: PostGIS 2.4.1PostGIS 2.4.2

comment:15 by pramsey, 6 years ago

Milestone: PostGIS 2.4.2PostGIS 2.4.3

comment:16 by robe, 6 years ago

Summary: test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runstest_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs, and travis sometimes

Yeh travis crashed on kmeans test as well recently (not just all in winnie's head something is fishy in these mean waters)

https://travis-ci.org/postgis/postgis/jobs/322217076

This was run against trunk r16189

PostGIS is now configured for x86_64-unknown-linux-gnu
 -------------- Compiler Info ------------- 
  C compiler:           gcc -O3 -march=native -mtune=native
  SQL preprocessor:     /usr/bin/cpp -traditional-cpp -w -P
 -------------- Additional Info ------------- 
  Interrupt Tests:   DISABLED use: --with-interrupt-tests to enable
 -------------- Dependencies -------------- 
  GEOS config:          /usr/bin/geos-config
  GEOS version:         3.5.0
  GDAL config:          /usr/bin/gdal-config
  GDAL version:         2.2.2
  SFCGAL config:        /usr/bin/sfcgal-config
  SFCGAL version:       1.2.2
  PostgreSQL config:    /usr/lib/postgresql/9.6/bin/pg_config
  PostgreSQL version:   PostgreSQL 9.6.6
  PROJ4 version:        49
  Libxml2 config:       /usr/bin/xml2-config
  Libxml2 version:      2.9.1
  JSON-C support:       yes
  protobuf-c support:   no
  PCRE support:         yes
  Perl:                 /usr/bin/perl
 --------------- Extensions --------------- 
  PostGIS Raster:       enabled
  PostGIS Topology:     enabled
  SFCGAL support:       enabled
  Address Standardizer support:       enabled
 -------- Documentation Generation -------- 
  xsltproc:             /usr/bin/xsltproc
  xsl style sheets:     /usr/share/xml/docbook/stylesheet/docbook-xsl
  dblatex:              /usr/bin/dblatex
  convert:              /usr/bin/convert
  mathml2.dtd:          /usr/share/xml/schema/w3c/mathml/dtd/mathml2.dtd
Test: test_kmeans ...make[2]: *** [check] Illegal instruction (core dumped)
make[2]: Leaving directory `/home/travis/build/postgis/postgis/liblwgeom/cunit'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/travis/build/postgis/postgis/liblwgeom'
make: *** [check] Error 1

comment:17 by komzpa, 6 years ago

How about we adopt logbt for cunit as non-temporary measure?

It will just print backtrace for anything running under it if it dumps core.

I've used it like this (full path to cunit was also needed): https://github.com/postgis/postgis/pull/176/commits/f2f06a11572bb25168fe375c9236d3b351f4607e

Likely much more templating is needed to detect presence of logbt and run under it if it's there.

comment:18 by robe, 6 years ago

yah that would be great. Not sure how to move forward with that.

BTW winnie's 64-bit on 2.5.0 failed

  Test: test_kmeans ...Makefile:86: recipe for target 'check' failed
make[2]: *** [check] Segmentation fault
make[2]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom/cunit'
Makefile:207: recipe for target 'check' failed
make[1]: *** [check] Error 2
make[1]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom'
GNUmakefile:16: recipe for target 'check' failed
make: *** [check] Error 1

comment:19 by komzpa, 6 years ago

logbt enabled on travis. If it ever reproduces there it will be logged, although likely reason for Illegal Instruction failure was due to -march=native and travis faking CPU ID.

comment:20 by robe, 6 years ago

Milestone: PostGIS 2.4.3PostGIS 2.4.4

after all your changes this might not be an issue anymore, but I'll keep it open until we confirm.

comment:21 by pramsey, 6 years ago

Couple weeks, looking OK, @robe?

comment:22 by komzpa, 6 years ago

Owner: changed from robe to komzpa
Status: newassigned

comment:23 by komzpa, 6 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.