#5172 closed defect (fixed)
GHA ci main/main is failing
Reported by: | robe | Owned by: | robe |
---|---|---|---|
Priority: | blocker | Milestone: | PostGIS 3.3.0 |
Component: | raster | Version: | master |
Keywords: | Cc: |
Description
Our github action job that tests main GEOS, Master GDAL, master PostgreSQL is failing. The last build of the docker image was 7hrs ago, so I assume it might be something changed in one of those projects. I suspect it's either GDAL or PostgreSQL.
https://github.com/postgis/postgis/runs/6938853618?check_suite_focus=true
PostgreSQL 15beta1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit Postgis 3.3.0dev - (6cd8406) - 2022-06-17 15:39:34 scripts 3.3.0dev 6cd8406 raster scripts 3.3.0dev 6cd8406 GEOS: 3.11.0beta2-CAPI-1.16.0 PROJ: 9.1.0 SFCGAL: 1.4.1 GDAL: GDAL 3.6.0dev-fed5a54, released 2022/06/16
It croaks at this point:
Died at ./regress/run_test.pl line 778. ./raster/test/regress/check_gdal .. failed (psql exited with an error: /tmp/pgis_reg/test_223_out) ----------------------------------------------------------------------------- invalid_path psql:check_gdal.sql:20: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql:check_gdal.sql:20: error: connection to server was lost ----------------------------------------------------------------------------- make: *** [regress/runtest.mk:24: check-regress] Error 2 [logbt] saw 'make' exit with code:2 (INT) [logbt] Found corefile (non-tracked) at /tmp/logbt-coredumps/core.12126.!usr!local!pgsql!bin!postgres [logbt] Processing cores... warning: Can't open file /dev/shm/PostgreSQL.2296011912 during file-backed mapping note processing warning: Can't open file /dev/shm/PostgreSQL.1620786782 during file-backed mapping note processing warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing warning: Can't open file /SYSV002fa66a (deleted) during file-backed mapping note processing [New LWP 12126] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 48 iofclose.c: No such file or directory. Core was generated by `postgres: postgres postgis_reg-3.'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f4bcc71d217 in _IO_new_fclose (fp=0x0) at iofclose.c:48 Thread 1 (Thread 0x7f4bcc6a6740 (LWP 12126)): #0 0x00007f4bcc71d217 in _IO_new_fclose (fp=0x0) at iofclose.c:48 status = <optimized out> #1 0x00007f4bc30eb29a in VSIStdinFilesystemHandler::~VSIStdinFilesystemHandler (this=<optimized out>, __in_chrg=<optimized out>) at cpl_vsil_stdin.cpp:399 No locals. #2 0x00007f4bc30eb319 in VSIStdinFilesystemHandler::~VSIStdinFilesystemHandler (this=0x558fcf7023b0, __in_chrg=<optimized out>) at cpl_vsil_stdin.cpp:408 No locals. #3 0x00007f4bc308f6a7 in VSIFileManager::~VSIFileManager (this=0x558fcfa7c800, __in_chrg=<optimized out>) at cpl_vsil.cpp:2795 iter = {first = "/vsistdin/", second = 0x558fcf7023b0} oSetAlreadyDeleted = std::set with 16 elements = {[0] = 0x558fcf7023b0, [1] = 0x558fcf703be0, [2] = 0x558fcf70f650, [3] = 0x558fcf7255a0, [4] = 0x558fcf725830, [5] = 0x558fcf725ac0, [6] = 0x558fcf72ab10, [7] = 0x558fcf749460, [8] = 0x558fcf753120, [9] = 0x558fcf76e850, [10] = 0x558fcf76fce0, [11] = 0x558fcf8251f0, [12] = 0x558fcfa69c40, [13] = 0x558fcfa82120, [14] = 0x558fcfa9c060, [15] = 0x558fcfae85d0} oSetAlreadyDeleted = Python Exception <class 'gdb.error'> value has been optimized out: iter = Python Exception <class 'gdb.error'> value has been optimized out: #4 0x00007f4bc308f775 in VSICleanupFileManager () at cpl_vsil.cpp:2928 No locals. #5 0x00007f4bc2bf0f3e in GDALDriverManager::~GDALDriverManager (this=0x558fcf728600, __in_chrg=<optimized out>) at gdaldrivermanager.cpp:273 bHasDroppedRef = <optimized out> nDSCount = 0
This is the first failure, and no code changes have been made between now and last successful build aside from the docker image rebuild.
Change History (7)
follow-up: 2 comment:1 by , 3 years ago
comment:2 by , 3 years ago
Replying to pramsey:
I just did a local build with the latest GDAL and latest GEOS and latest PostGIS and no crash. I incline to wondering if something is wrong with the image, or how it is being created.
It's also possible something changed in the past 4 days since the image was built. I'll do a rebuild to see if it continues and if so I'll try next on debbie.
comment:3 by , 3 years ago
I forgot I do build gdal master on debbie. I tested with master on debbie and no issue. The GHA docker image is still building, I'll trigger a run after it is done building.
comment:4 by , 3 years ago
okay sadly even after rebuild of docker image, it's still erroring out
./raster/test/regress/check_gdal .. failed (psql exited with an error: /tmp/pgis_reg/test_223_out) ----------------------------------------------------------------------------- invalid_path psql:check_gdal.sql:20: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql:check_gdal.sql:20: error: connection to server was lost ----------------------------------------------------------------------------- make: *** [regress/runtest.mk:24: check-regress] Error 2 [logbt] saw 'make' exit with code:2 (INT) [logbt] Found corefile (non-tracked) at /tmp/logbt-coredumps/core.12126.!usr!local!pgsql!bin!postgres [logbt] Processing cores...
If it were something wrong with the base image, I would think we would be seeing issues with pg14-clang-geosmain-gdal34-proj71 which are also based on same image with only difference being compiled versions of PostgreSQL / Proj / and GDAL.
Could it be maybe some play with Proj? This is also running bleeding edge proj 9.1 and that is one thing I'm not testing on debbie, as she's running with system installed proj.
I'm going to create an image with Proj 7.1 (like the pg14, to rule proj out as the culprit)
comment:5 by , 3 years ago
I think the culprit is PROJ 9.1.
I swapped out the latest (which had GDAL 3.6, PROJ 9.1, PostgreSQL 15, GEOS 3.11) and replaced with GDAL 3.6, PROJ 9.0, PostgreSQL 15, GEOS 3.11, and there is no error.
https://github.com/postgis/postgis/runs/7058314172?check_suite_focus=true
PostgreSQL 15beta1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit Postgis 3.3.0dev - (0bc5aa5) - 2022-06-26 04:15:30 scripts 3.3.0dev 0bc5aa5 raster scripts 3.3.0dev 0bc5aa5 GEOS: 3.11.0beta3-CAPI-1.16.0 PROJ: 9.0.1 SFCGAL: 1.4.1 GDAL: GDAL 3.6.0dev-125fafc, released 2022/06/25
as opposed to the latest, which crashes on the gdal driver test
PostgreSQL 15beta1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit Postgis 3.3.0dev - (76a92a5) - 2022-06-24 19:04:03 scripts 3.3.0dev 76a92a5 raster scripts 3.3.0dev 76a92a5 GEOS: 3.11.0beta2-CAPI-1.16.0 PROJ: 9.1.0 SFCGAL: 1.4.1 GDAL: GDAL 3.6.0dev-c5ffcd7, released 2022/06/19
Although I suppose it still could be GDAL since it looks like perhaps GHA has the docker latest cached, since last run is till not the latest build of GDAL
comment:7 by , 3 years ago
Okay it looks like the latest is working now. So wasn't PROJ either. Must have been a fix in GDAL between 6/19 and 6/23 and the last postgis build with latest was on 6/24 so must have just missed the latest publish by a hair because this one:
https://github.com/postgis/postgis/runs/7062500186?check_suite_focus=true
PostgreSQL 15beta1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit Postgis 3.3.0dev - (5bcf72d) - 2022-06-26 18:08:58 scripts 3.3.0dev 5bcf72d raster scripts 3.3.0dev 5bcf72d GEOS: 3.11.0beta3-CAPI-1.16.0 PROJ: 9.1.0 SFCGAL: 1.4.1 GDAL: GDAL 3.6.0dev-2d29343, released 2022/06/24
Ran successfully
I just did a local build with the latest GDAL and latest GEOS and latest PostGIS and no crash. I incline to wondering if something is wrong with the image, or how it is being created.