Opened 6 years ago
Closed 5 years ago
#3700 closed defect (fixed)
test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs, and travis sometimes
|Reported by:||robe||Owned by:||komzpa|
This is beginning to annoy me. I thought I had this in a ticket already but couldn't find it.
On occassion especially during high-load, winnie's 32-bit runs fail on this test:
Test: test_kmeans ...Makefile:85: recipe for target `check' failed
It's always that test and when I think I've only seen the 32-bit runs fail. They fail about once every 3-5 runs.
Could be windows, or there is something wrong with kmeans that shows up more often on 32-bit systems.
Change History (23)
comment:1 by , 6 years ago
|Component:||buildbots → postgis|
comment:2 by , 6 years ago
comment:3 by , 6 years ago
Yes she runs with RUNTTESTFLAGS=-v it looks like:
But that doesn't explain why it only fails on 32-bit and not 64-bit does it?
comment:4 by , 6 years ago
No, but the lack of diff output suggests to me that there's no difference between expected and obtained output, thus the error must be in the run_test script itself. Can you try to run it in isolation on that machine, against the specific offending testcase ?
comment:5 by , 6 years ago
This is in cunit. I thought run_test is just for the SQL.
comment:6 by , 6 years ago
|Summary:||test_kmeans fails on winnie often on 32-bit runs → test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs|
Okay this just happened on winnie's 64-bit trunk run so guess not limited to 32-bit. This is the first time I recall it happening on 64-bit.
Test: test_kmeans ...Makefile:85: recipe for target `check' failed make: *** [check] Error 255 make: Leaving directory `/projects/postgis/branches/2.4/liblwgeom/cunit' Makefile:205: recipe for target `check' failed make: *** [check] Error 2 make: Leaving directory `/projects/postgis/branches/2.4/liblwgeom' GNUmakefile:16: recipe for target `check' failed make: *** [check] Error 1
comment:7 by , 6 years ago
|Milestone:||PostGIS 2.4.0 → PostGIS 2.5.0|
|Priority:||medium → high|
comment:8 by , 6 years ago
|Status:||new → closed|
I just ran the cunit tests on 4 cores simultaneously in a big loop looking for this failure, but didn't get it. Maybe it's gone? ha ha.
comment:9 by , 6 years ago
|Status:||closed → reopened|
Nice try buddy. Keeping this for 2.5. I actually haven't been testing 32-bit for a while cause I have a more pressing issue with it failing on shp2pgsql-gui that I haven't figured out. So I turned off testing on 32-bit until I've squared that away.
Anyway like I said I think I've only seen this on windows, so it might have to do with the fact I compile with mingw and test against a VC++ build that it's seeing something you aren't. I'll reassign to myself and try to nail down the issue in 2.5.
comment:10 by , 6 years ago
|Status:||reopened → new|
comment:11 by , 6 years ago
hah guess it's still a problem. Just happened to me when testing r15671 on my mingw gcc 4.8.3 64-bit. though error is a little different so perhaps not quite the same thing.
CUnit - A unit testing framework for C - Version 2.1-2 http://cunit.sourceforge.net/ Suite: computational_geometry Test: test_lw_segment_side ...passed Test: test_lw_segment_intersects ...passed Test: test_lwline_crossing_short_lines ...passed Test: test_lwline_crossing_long_lines ...passed Test: test_lwline_crossing_bugs ...passed Test: test_lwpoint_set_ordinate ...passed Test: test_lwpoint_get_ordinate ...passed Test: test_point_interpolate ...passed Test: test_lwline_clip ...passed Test: test_lwline_clip_big ...passed Test: test_lwmline_clip ...passed Test: test_geohash_point ...passed Test: test_geohash_precision ...passed Test: test_geohash ...passed Test: test_geohash_point_as_int ...passed Test: test_isclosed ...passed Test: test_lwgeom_simplify ...passed Test: test_lw_arc_center ...passed Test: test_point_density ...passed Test: test_kmeans ...Makefile:86: recipe for target 'check' failed make: *** [check] Segmentation fault make: Leaving directory '/projects/postgis/branches/2.4/liblwgeom/cunit' Makefile:205: recipe for target 'check' failed make: *** [check] Error 2 make: Leaving directory '/projects/postgis/branches/2.4/liblwgeom' GNUmakefile:16: recipe for target 'check' failed make: *** [check] Error 1
I should add it's not repeatable. I did another make check exactly the same way and it was fine this time around.
comment:12 by , 6 years ago
damn I wish this happened consistently. I got the error again but then can't repeat it trying 4 times after. true Heisenberg. I'll try throwing in some debug notices to see if I can at least catch where it's happening.
comment:13 by , 5 years ago
|Milestone:||PostGIS 2.5.0 → PostGIS 2.4.1|
still failing randomly usually on 32-bit runs.
comment:14 by , 5 years ago
|Milestone:||PostGIS 2.4.1 → PostGIS 2.4.2|
comment:15 by , 5 years ago
|Milestone:||PostGIS 2.4.2 → PostGIS 2.4.3|
comment:16 by , 5 years ago
|Summary:||test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs → test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs, and travis sometimes|
Yeh travis crashed on kmeans test as well recently (not just all in winnie's head something is fishy in these mean waters)
This was run against trunk r16189
PostGIS is now configured for x86_64-unknown-linux-gnu -------------- Compiler Info ------------- C compiler: gcc -O3 -march=native -mtune=native SQL preprocessor: /usr/bin/cpp -traditional-cpp -w -P -------------- Additional Info ------------- Interrupt Tests: DISABLED use: --with-interrupt-tests to enable -------------- Dependencies -------------- GEOS config: /usr/bin/geos-config GEOS version: 3.5.0 GDAL config: /usr/bin/gdal-config GDAL version: 2.2.2 SFCGAL config: /usr/bin/sfcgal-config SFCGAL version: 1.2.2 PostgreSQL config: /usr/lib/postgresql/9.6/bin/pg_config PostgreSQL version: PostgreSQL 9.6.6 PROJ4 version: 49 Libxml2 config: /usr/bin/xml2-config Libxml2 version: 2.9.1 JSON-C support: yes protobuf-c support: no PCRE support: yes Perl: /usr/bin/perl --------------- Extensions --------------- PostGIS Raster: enabled PostGIS Topology: enabled SFCGAL support: enabled Address Standardizer support: enabled -------- Documentation Generation -------- xsltproc: /usr/bin/xsltproc xsl style sheets: /usr/share/xml/docbook/stylesheet/docbook-xsl dblatex: /usr/bin/dblatex convert: /usr/bin/convert mathml2.dtd: /usr/share/xml/schema/w3c/mathml/dtd/mathml2.dtd
Test: test_kmeans ...make: *** [check] Illegal instruction (core dumped) make: Leaving directory `/home/travis/build/postgis/postgis/liblwgeom/cunit' make: *** [check] Error 2 make: Leaving directory `/home/travis/build/postgis/postgis/liblwgeom' make: *** [check] Error 1
comment:17 by , 5 years ago
How about we adopt logbt for cunit as non-temporary measure?
It will just print backtrace for anything running under it if it dumps core.
I've used it like this (full path to cunit was also needed): https://github.com/postgis/postgis/pull/176/commits/f2f06a11572bb25168fe375c9236d3b351f4607e
Likely much more templating is needed to detect presence of logbt and run under it if it's there.
comment:18 by , 5 years ago
yah that would be great. Not sure how to move forward with that.
BTW winnie's 64-bit on 2.5.0 failed
Test: test_kmeans ...Makefile:86: recipe for target 'check' failed make: *** [check] Segmentation fault make: Leaving directory '/projects/postgis/branches/2.4/liblwgeom/cunit' Makefile:207: recipe for target 'check' failed make: *** [check] Error 2 make: Leaving directory '/projects/postgis/branches/2.4/liblwgeom' GNUmakefile:16: recipe for target 'check' failed make: *** [check] Error 1
comment:19 by , 5 years ago
logbt enabled on travis. If it ever reproduces there it will be logged, although likely reason for Illegal Instruction failure was due to -march=native and travis faking CPU ID.
comment:20 by , 5 years ago
|Milestone:||PostGIS 2.4.3 → PostGIS 2.4.4|
after all your changes this might not be an issue anymore, but I'll keep it open until we confirm.
comment:21 by , 5 years ago
Couple weeks, looking OK, @robe?
comment:22 by , 5 years ago
|Status:||new → assigned|
comment:23 by , 5 years ago
|Status:||assigned → closed|
Do you run with RUNTESTFLAGS=-v ? If there's no output but non-success return then maybe it's a missing "return" somewhere, leaving the return code to phase-of-the-moon matters