Opened 2 years ago

Closed 2 years ago

#5121 closed defect (fixed)

LTO enabled causes windows, freebsd and some github actions to fail

Reported by: robe Owned by: robe
Priority: blocker Milestone: PostGIS 3.3.0
Component: QA/buildbots Version: master
Keywords: Cc:

Description

export-all-symbols -Wl,--out-implib=libpostgis-3.3.a
lto1.exe: internal compiler error: in gen_subprogram_die, at dwarf2out.c:22668
libbacktrace could not find executable to open
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://sourceforge.net/projects/mingw-w64> for instructions.
lto-wrapper.exe: fatal error: C:\ming64gcc81\mingw64\bin\gcc.exe returned 1 exit status
compilation terminated.
C:/ming64gcc81/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.1.0/../../../../x86_64-w64-mingw32/bin/ld.exe: error: lto-wrapper failed
collect2.exe: error: ld returned 1 exit status
make[1]: *** [E:/jenkins/postgresql/rel/pg14w64gcc81/lib/pgxs/src/makefiles/../../src/Makefile.shlib:374: postgis-3.3.dll] Error 1
make[1]: Leaving directory '/projects/postgis/branches/3.3/postgis'
make: *** [GNUmakefile:24: all] Error 1

And searching for this on the internet, led me back to my old ticket from 2 years ago #4583

Which Raul kindly pointed out was because of #4754.

@komzpa I recall an LTO commit of yours recently. I admittedly have not been paying too much attention, been ignoring the problem hoping it would go away.

bessie32 is also failing, but could be a different issue

15:52:22 libtool: link: gcc8 -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-trunc -fno-math-errno -fno-signed-zeros -Wall -flto -fPIC -DPIC -I./../rt_core -I./.. -I. -I../.. -I../../liblwgeom -I../../liblwgeom -I/usr/local/include -I/usr/local/include -I/usr/local/include -I/usr/local/include raster2pgsql.o -flto -o raster2pgsql  ../rt_core/librtcore.a ../../liblwgeom/.libs/liblwgeom.a -L/usr/local/lib -lm -lproj -ljson-c -lSFCGAL -lgdal -lgeos_c -lintl -liconv
15:52:23 /usr/local/bin/ld: /tmp//cczKohy3.ltrans0.ltrans.o: undefined reference to symbol 'rtrealloc'
15:52:23 /usr/local/bin/ld: /usr/local/lib/librttopo.so.1: error adding symbols: DSO missing from command line
15:52:23 collect2: error: ld returned 1 exit status
15:52:23 gmake[3]: *** [Makefile:86: raster2pgsql] Error 1
15:52:23 gmake[3]: Leaving directory '/usr/home/jenkins/workspace/PostGIS_Worker_Run/label/bessie32/b0741830443c896ebbf15b51486a2b23787b7485/raster/loader'
15:52:23 gmake[2]: *** [Makefile:35: rtloader] Error 2
15:52:23 gmake[2]: Leaving directory '/usr/home/jenkins/workspace/PostGIS_Worker_Run/label/bessie32/b0741830443c896ebbf15b51486a2b23787b7485/raster'
15:52:23 gmake[1]: *** [GNUmakefile:24: all] Error 1
15:52:23 gmake[1]: Leaving directory '/usr/home/jenkins/workspace/PostGIS_Worker_Run/label/bessie32/b0741830443c896ebbf15b51486a2b23787b7485'
15:52:23 *** Error code 2
15:52:23 

But bessie (64-bit FreeBSD seems fine)

I still need to confirm I have the same issue on by dev.

Attachments (3)

debbie-bessie32-consoleText.log (268.9 KB ) - added by sergeish 2 years ago.
disable_lto.png (47.4 KB ) - added by robe 2 years ago.
after disabling LTO for all by accident
disable_lto_just_for_mingw.png (53.4 KB ) - added by robe 2 years ago.
renabled lto for all except mingw

Download all attachments as: .zip

Change History (20)

by sergeish, 2 years ago

comment:1 by sergeish, 2 years ago

Hi,

I made that breaking commit, link to Github PR: https://github.com/postgis/postgis/pull/678

I managed to replicate the issue on x86 FreeBSD 12 by installing gcc8 and gcc10, postgresql13-client (and required libraries), configuring with

../configure CC=gcc8 CXX=g++8 AR=gcc-ar8 RANLIB=gcc-ranlib8 CXXFLAGS='-O2 -pipe -fstack-protector-strong -Wl,-rpath=/usr/local/lib/gcc8 -nostdinc++ -isystem /usr/include/c++/v1 -Wl,-rpath=/usr/local/lib/gcc8' CFLAGS='-Wall -Wmissing-prototypes -Wpointer-arith -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-trunc' '--with-libiconv=/usr/local' --without-interrupt-tests

(FreeBSD test explicitly sets CC and CXX, but configure fails to select correct ar/ranlib, since they are not called gcc-ar/gcc-ranlib but gcc-ar8/gcc-ranlig8 (please see attached log), so I set them explicitly.)

The problem is a compiler version (LTO version) mismatch between selected gcc8 and gcc10 in PGXS Makefile.global (/usr/local/lib/postgresql/pgxs/src/Makefile.global). I tried to fix this by setting CUSTOM_CC before including pgxs.mk (/usr/local/lib/postgresql/pgxs/src/makefiles/pgxs.mk) and that allowed to build the extensions. PR draft: https://github.com/postgis/postgis/pull/679

This is not quite a solution since CFLAGS in Makefile.global in this case still contains -Wl,-rpath=/usr/local/lib/gcc10 and cannot be overwritten (but flags can be appended by setting CUSTOM_COPTS or PG_CFLAGS) and postgis-3.so has incorrect /usr/local/lib/gcc10 runpath.

I haven't tried building with MinGW yet.

That's as far as I could get for now, will appreciate any help.

comment:2 by robe, 2 years ago

@sergeish,

Thanks for the quick response. I'll test it out on my mingw setup and commit if it works.

comment:3 by robe, 2 years ago

Okay tested on my mingw setup (my setup is old BTW gcc 8.1 but that is another story)

At anyrate the patch seems to screw up ability to find CC.

cc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2  -I../liblwgeom -I../liblwgeom -std=gnu99 -g -O2 -fno-math-errno -fno-signed-zeros -Wall -flto -I../libpgcommon -I../deps/flatgeobuf -I../deps/wagyu -I../deps/uthash/include  -I/projects/geos/rel-3.11w64gcc81/include -IC:/ming64gcc81/projects/proj/rel-7.2.1w64gcc81/include   -IC:/ming64gcc81/projects/protobuf/rel-3.2.0w64gcc81/include   -I/projects/libxml/rel-libxml2-2.9.9w64gcc81/include/libxml2 -I/projects/CGAL/rel-sfcgal-1.4.0w64gcc81/include -IC:/ming64gcc81/projects/json-c/rel-0.12w64gcc81/include/json-c   -IC:/ming64gcc81/projects/pcre/rel-8.33w64gcc81/include   -DNDEBUG -I/projects/postgresql/rel/pg15w64gcc81/include -I/projects/rel-libiconv-1.16w64gcc81/include  -DDLL_EXPORT -DPIC -I. -I./ -IC:/MING64~1/projects/POSTGR~1/rel/PG15W6~1/include/server -IC:/MING64~1/projects/POSTGR~1/rel/PG15W6~1/include/internal  -I/projects/zlib/rel-zlib-1.2.11w64gcc81/include  -I/projects/libxml/rel-libxml2-2.9.9w64gcc81/include -I./src/include/port/win32 -I/projects/libxml/rel-libxml2-2.9.9w64gcc81/include/libxml2 -IC:/ming64gcc81/projects/lz4/rel-lz4-1.9.3w64gcc81/include  -IC:/MING64~1/projects/POSTGR~1/rel/PG15W6~1/include/server/port/win32 -DWIN32_STACK_RLIMIT=4194304  -c -o postgis_module.o postgis_module.c
/bin/sh: line 1: cc: command not found

Looking at the postgis/Makefile generated, it seems to have

CUSTOM_CC := $(CC)

Which I am assuming is the culprit. By comparison, the generated liblwgeom/Makefile has

CC = x86_64-w64-mingw32-gcc

trying to change Makefile.in to below gets me back to the original error

CUSTOM_CC := @CC@

FWIW I think @strk was saying we should get rid of PGXS as it's causing more issues than helping.

comment:4 by sergeish, 2 years ago

Thanks @robe.

My conclusion was that PGXS has to be replaced, since it intoduces another set of compilation options that is not fully controlled by user. It seems that LTO cannot be reliably enabled by default before doing that, especially if we need to allow selecting CC in ./configure like in this failing tests.

Thank you for mentioning @strk's opinion. Would be great if he could comment on this issue.

CUSTOM_CC := $(CC) was my attempt to make PGXS use the same compiler. Seemed to work in my case, but later I noticed cc being used as compiler in log.

comment:5 by strk, 2 years ago

I never liked delegating control to PGXS, the first victim of this was —prefix support which is still an issue after over 11 years: #635

comment:6 by Regina Obe <lr@…>, 2 years ago

In e76f1d8/git:

Disable LTO on mingw. References #5121 for PostGIS 3.3

comment:7 by sergeish, 2 years ago

The commit above just disables LTO everywhere. if test "MINGWBUILD" = "0"; then should be if test "$MINGWBUILD" = "0"; then.

comment:8 by Regina Obe <lr@…>, 2 years ago

In 9b582b34/git:

Disable LTO on mingw instead of everywhere (fixes last committ). References #5121 for PostGIS 3.3

comment:9 by robe, 2 years ago

@sergeish

I just noticed that the commit where I accidentally disabled LTO everywhere, we got all green lights on github actions

https://github.com/postgis/postgis/actions/runs/2093940177

So I guess LTO is causing the errors on github too. Does your pull request solve the github issues you know?

comment:10 by robe, 2 years ago

Summary: winnie is broken with this strange error lto1.exe: internal compiler error: in gen_subprogram_die, at dwarf2out.c:22668LTO enabled causes windows, freebsd and some github actions to fail

changing the title of this since it seems more involved than just mingw

comment:11 by sergeish, 2 years ago

@robe, sorry, I don't understand your question about github issues. Do you mean is there a ticket requesting LTO?

by robe, 2 years ago

Attachment: disable_lto.png added

after disabling LTO for all by accident

by robe, 2 years ago

renabled lto for all except mingw

comment:12 by robe, 2 years ago

@sergeish,

About github actions (not issues). I've added the screen shots to show what I mean. The ticket thing there is a GH pull request which is fine.

When I accidentally disabled LTO for all systems, all GH actions became green. A couple have been red for a while.

# Regina accidentally disabling LTO entirely after disabling LTO for all by accident

When I changed to just disable for mingw, then those went red again though winnie was still happy :)

# Regina changing to just disable for mingw windows renabled lto for all except mingw

I was baffled with the errors on the GH actions cause they are each different so I thought they were caused by bad docker builds or a change in GDAL.

1) CI (pg14-clang-geosmain-gdal34-proj71, usan_clang) and (pg13-clang-geos39-gdal31-proj71, usan_clang)

couldn't find GDALALL checking for library containing GDALAllRegister… no

Error: Process completed with exit code 1.

2)CI (pg13-geos39-gdal31-proj71, usan_gcc) psql:/src/postgis/regress/00-regress-install/share/contrib/postgis/sfcgal.sql:52: ERROR: could not load library "/src/postgis/regress/00-regress-install/lib/postgis_sfcgal-3.so": /src/postgis/regress/00-regress-install/lib/postgis_sfcgal-3.so: undefined symbol: ubsan_handle_mul_overflow

comment:13 by sergeish, 2 years ago

Yes, github action errors seem not related at first glance. I decided to switch the breaking PR from draft because of that.

Unfortunately I still don't have a fix. I'm going to proceed as if the plan is to get rid of PGXS and hopefully find some kind of solution in the process.

Adding LTO flags automatically should probably be disabled for now.

comment:14 by robe, 2 years ago

Okay so maybe we can defined a —with-lto config to enable them?

comment:15 by sergeish, 2 years ago

PR replacing MINGWBUILD check with —enable-lto option: https://github.com/postgis/postgis/pull/681

comment:16 by pramsey, 2 years ago

I come from the opposite side of the pgxs, I feel like it cleared up a lot of alternate problems by anal retentively enforcing a "build your extension just like your server" rule which probably saved us from a lot of really obscure mixed-compiler, fun-platform bugs which we are not taking into our calculations of the "cost of pgxs" because we never ever saw them, because they didn't exist.

comment:17 by Regina Obe <lr@…>, 2 years ago

Resolution: fixed
Status: newclosed

In 6a6cc54/git:

Merge remote-tracking branch 'konturio/configure-enable-lto-option'
Require —with-lto config to enable LTO.
Closes #4574 for PostGIS 3.3.0
Closes #5121 for PostGIS 3.3.0

Note: See TracTickets for help on using tickets.