Opened 8 years ago
Closed 7 years ago
#3700 closed defect (fixed)
test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs, and travis sometimes
Reported by: | robe | Owned by: | komzpa |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 2.4.4 |
Component: | postgis | Version: | 2.3.x |
Keywords: | Cc: |
Description
This is beginning to annoy me. I thought I had this in a ticket already but couldn't find it.
On occassion especially during high-load, winnie's 32-bit runs fail on this test:
Test: test_kmeans ...Makefile:85: recipe for target `check' failed
It's always that test and when I think I've only seen the 32-bit runs fail. They fail about once every 3-5 runs.
Could be windows, or there is something wrong with kmeans that shows up more often on 32-bit systems.
Change History (23)
comment:1 by , 8 years ago
Component: | buildbots → postgis |
---|---|
Owner: | changed from | to
comment:2 by , 8 years ago
comment:3 by , 8 years ago
Yes she runs with RUNTTESTFLAGS=-v it looks like:
https://git.osgeo.org/gogs/postgis/postgis/src/svn-trunk/ci/winnie/regress_postgis.sh#L143
But that doesn't explain why it only fails on 32-bit and not 64-bit does it?
comment:4 by , 8 years ago
No, but the lack of diff output suggests to me that there's no difference between expected and obtained output, thus the error must be in the run_test script itself. Can you try to run it in isolation on that machine, against the specific offending testcase ?
comment:6 by , 8 years ago
Summary: | test_kmeans fails on winnie often on 32-bit runs → test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs |
---|
Okay this just happened on winnie's 64-bit trunk run so guess not limited to 32-bit. This is the first time I recall it happening on 64-bit.
Test: test_kmeans ...Makefile:85: recipe for target `check' failed make[2]: *** [check] Error 255 make[2]: Leaving directory `/projects/postgis/branches/2.4/liblwgeom/cunit' Makefile:205: recipe for target `check' failed make[1]: *** [check] Error 2 make[1]: Leaving directory `/projects/postgis/branches/2.4/liblwgeom' GNUmakefile:16: recipe for target `check' failed make: *** [check] Error 1
comment:7 by , 7 years ago
Milestone: | PostGIS 2.4.0 → PostGIS 2.5.0 |
---|---|
Priority: | medium → high |
comment:8 by , 7 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
I just ran the cunit tests on 4 cores simultaneously in a big loop looking for this failure, but didn't get it. Maybe it's gone? ha ha.
comment:9 by , 7 years ago
Resolution: | worksforme |
---|---|
Status: | closed → reopened |
Nice try buddy. Keeping this for 2.5. I actually haven't been testing 32-bit for a while cause I have a more pressing issue with it failing on shp2pgsql-gui that I haven't figured out. So I turned off testing on 32-bit until I've squared that away.
Anyway like I said I think I've only seen this on windows, so it might have to do with the fact I compile with mingw and test against a VC++ build that it's seeing something you aren't. I'll reassign to myself and try to nail down the issue in 2.5.
comment:10 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | reopened → new |
comment:11 by , 7 years ago
hah guess it's still a problem. Just happened to me when testing r15671 on my mingw gcc 4.8.3 64-bit. though error is a little different so perhaps not quite the same thing.
CUnit - A unit testing framework for C - Version 2.1-2 http://cunit.sourceforge.net/ Suite: computational_geometry Test: test_lw_segment_side ...passed Test: test_lw_segment_intersects ...passed Test: test_lwline_crossing_short_lines ...passed Test: test_lwline_crossing_long_lines ...passed Test: test_lwline_crossing_bugs ...passed Test: test_lwpoint_set_ordinate ...passed Test: test_lwpoint_get_ordinate ...passed Test: test_point_interpolate ...passed Test: test_lwline_clip ...passed Test: test_lwline_clip_big ...passed Test: test_lwmline_clip ...passed Test: test_geohash_point ...passed Test: test_geohash_precision ...passed Test: test_geohash ...passed Test: test_geohash_point_as_int ...passed Test: test_isclosed ...passed Test: test_lwgeom_simplify ...passed Test: test_lw_arc_center ...passed Test: test_point_density ...passed Test: test_kmeans ...Makefile:86: recipe for target 'check' failed make[2]: *** [check] Segmentation fault make[2]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom/cunit' Makefile:205: recipe for target 'check' failed make[1]: *** [check] Error 2 make[1]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom' GNUmakefile:16: recipe for target 'check' failed make: *** [check] Error 1
I should add it's not repeatable. I did another make check exactly the same way and it was fine this time around.
comment:12 by , 7 years ago
damn I wish this happened consistently. I got the error again but then can't repeat it trying 4 times after. true Heisenberg. I'll try throwing in some debug notices to see if I can at least catch where it's happening.
comment:13 by , 7 years ago
Milestone: | PostGIS 2.5.0 → PostGIS 2.4.1 |
---|
still failing randomly usually on 32-bit runs.
comment:14 by , 7 years ago
Milestone: | PostGIS 2.4.1 → PostGIS 2.4.2 |
---|
comment:15 by , 7 years ago
Milestone: | PostGIS 2.4.2 → PostGIS 2.4.3 |
---|
comment:16 by , 7 years ago
Summary: | test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs → test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs, and travis sometimes |
---|
Yeh travis crashed on kmeans test as well recently (not just all in winnie's head something is fishy in these mean waters)
https://travis-ci.org/postgis/postgis/jobs/322217076
This was run against trunk r16189
PostGIS is now configured for x86_64-unknown-linux-gnu -------------- Compiler Info ------------- C compiler: gcc -O3 -march=native -mtune=native SQL preprocessor: /usr/bin/cpp -traditional-cpp -w -P -------------- Additional Info ------------- Interrupt Tests: DISABLED use: --with-interrupt-tests to enable -------------- Dependencies -------------- GEOS config: /usr/bin/geos-config GEOS version: 3.5.0 GDAL config: /usr/bin/gdal-config GDAL version: 2.2.2 SFCGAL config: /usr/bin/sfcgal-config SFCGAL version: 1.2.2 PostgreSQL config: /usr/lib/postgresql/9.6/bin/pg_config PostgreSQL version: PostgreSQL 9.6.6 PROJ4 version: 49 Libxml2 config: /usr/bin/xml2-config Libxml2 version: 2.9.1 JSON-C support: yes protobuf-c support: no PCRE support: yes Perl: /usr/bin/perl --------------- Extensions --------------- PostGIS Raster: enabled PostGIS Topology: enabled SFCGAL support: enabled Address Standardizer support: enabled -------- Documentation Generation -------- xsltproc: /usr/bin/xsltproc xsl style sheets: /usr/share/xml/docbook/stylesheet/docbook-xsl dblatex: /usr/bin/dblatex convert: /usr/bin/convert mathml2.dtd: /usr/share/xml/schema/w3c/mathml/dtd/mathml2.dtd
Test: test_kmeans ...make[2]: *** [check] Illegal instruction (core dumped) make[2]: Leaving directory `/home/travis/build/postgis/postgis/liblwgeom/cunit' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/travis/build/postgis/postgis/liblwgeom' make: *** [check] Error 1
comment:17 by , 7 years ago
How about we adopt logbt for cunit as non-temporary measure?
It will just print backtrace for anything running under it if it dumps core.
I've used it like this (full path to cunit was also needed): https://github.com/postgis/postgis/pull/176/commits/f2f06a11572bb25168fe375c9236d3b351f4607e
Likely much more templating is needed to detect presence of logbt and run under it if it's there.
comment:18 by , 7 years ago
yah that would be great. Not sure how to move forward with that.
BTW winnie's 64-bit on 2.5.0 failed
Test: test_kmeans ...Makefile:86: recipe for target 'check' failed make[2]: *** [check] Segmentation fault make[2]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom/cunit' Makefile:207: recipe for target 'check' failed make[1]: *** [check] Error 2 make[1]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom' GNUmakefile:16: recipe for target 'check' failed make: *** [check] Error 1
comment:19 by , 7 years ago
logbt enabled on travis. If it ever reproduces there it will be logged, although likely reason for Illegal Instruction failure was due to -march=native and travis faking CPU ID.
comment:20 by , 7 years ago
Milestone: | PostGIS 2.4.3 → PostGIS 2.4.4 |
---|
after all your changes this might not be an issue anymore, but I'll keep it open until we confirm.
comment:22 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:23 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Do you run with RUNTESTFLAGS=-v ? If there's no output but non-success return then maybe it's a missing "return" somewhere, leaving the return code to phase-of-the-moon matters