Opened 4 months ago

Closed 3 months ago

Last modified 3 months ago

#5666 closed defect (fixed)

Build reproducibility: timestamps in extension upgrade SQL scripts

Reported by: James Addison Owned by: strk
Priority: medium Milestone: PostGIS 3.4.3
Component: build Version: 3.4.x
Keywords: Cc: James Addison

Description

Problem

When PostGIS builds extensions, the upgrade-paths-rules.mk makefile emits a few comments into templated SQL files, including a 'Built on' line that includes a timestamp.

-- Just tag extension testing version as "ANY"
-- Installed by testing 
-- Built on 2024-02-12 12:47:27

The timestamp is generated by invoking the date command in a subshell (code ref), and this currently causes the output to differ on each build of the extensions (even from the same source code), making the build non-reproducible.

The expected/recommended behaviour is either to remove the timestamp completely (removing the source of divergence) or to derive a timestamp deterministically - typically by using the last-modified time of the source code or a timestamp of the release.

As far as I am currently aware, this is the only source of non-determinism when building the PostGIS extensions from the v3.4.2 code as Debian-packaged. I can't guarantee that there aren't others, but this is the only case I've encountered.

Additional Context

Related to ticket: #4148 (introduction of reproducible build time handling in PostGIS)

Originally reported downstream at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1063724

More information about timestamps in relation to reproducible builds: https://reproducible-builds.org/docs/timestamps/

Notes about a similar solution

There is an existing pattern in the configure.ac file that extracts a POSTGIS_BUILD_DATE string variable from a SOURCE_DATE_EPOCH environment variable (the recommended way to read a reproducible time value for use during builds).

To handle the possibility that it could be confusing or disruptive to remove the 'Built on' line completely, I've attached a patch that applies this same pattern to the upgrade-paths-rules.mk file. I've tested this using a checkout from v3.4.2 using:

postgis $ cd extensions
postgis/extensions $ rm -rf sql;
postgis/extensions $ mkdir -p sql;
postgis/extensions $ SOURCE_DATE_EPOCH=0 EXTENSION=testing make -f upgrade-paths-rules.mk sql/testing--TEMPLATED--TO--ANY.sql

Attachments (1)

extension-upgrade-sql-reproducible.patch (796 bytes ) - added by James Addison 4 months ago.
Patch to read SOURCE_DATE_EPOCH during extension upgrade script templating. Should have the value f1ca910dfcf382c01e0024a0419c9c49224b169bcff32ca09692f7de27c54f76 as sha256sum.

Download all attachments as: .zip

Change History (13)

by James Addison, 4 months ago

Patch to read SOURCE_DATE_EPOCH during extension upgrade script templating. Should have the value f1ca910dfcf382c01e0024a0419c9c49224b169bcff32ca09692f7de27c54f76 as sha256sum.

comment:1 by James Addison, 4 months ago

Please note: the patch as-written specifies an explicit format-string for the output date (+%Y-%m-%d %H:%M:%S), copied from the POSTGIS_BUILD_DATE format string (code ref).

The example SQL snippet in my description uses that format too - but in practice existing strings will have been formatted according to the build system's date program (-- Built on Mon Nov 20 20:03:08 UTC 2023, for example).

comment:2 by James Addison, 3 months ago

Component: postgisbuild
Owner: changed from pramsey to strk

Apologies for the edit noise; this appears to be a build-related issue, so I'm reassigning the component (and that auto-adjusts the owner, by the looks of it).

I'll also see whether I can propose a fix for this using gitea.

comment:3 by James Addison, 3 months ago

I'll also see whether I can propose a fix for this using gitea.

Pull request opened with the patch applied here: https://git.osgeo.org/gitea/postgis/postgis/pulls/186

(changeset created using the gitea web editor)

comment:4 by robe, 3 months ago

I think I'd prefer the date removed over trying to pull what configure gives us and relying on any format string.

comment:5 by James Addison, 3 months ago

Thanks robe - removing the timestamp would be a good (probably the best) solution from my point of view too, but I could understand if there are reasons/requirements to retain a timestamp within the file (either in a specific format or generally).

comment:6 by robe, 3 months ago

Seems strk thinks he needs it so I'll commit the patch in a bit. Thanks for the patch. I tested on my windows system and seems to be okay. The failure on gitea is caused by a different issue not related to your patch (about it treating warnings as errors).

I think this patch is safe enough to backport to at least 3.4 so I'll put in that version as well. Not sure if it's worth while to backport further than that. I'd be concerned cause I know our autoconf requirements changed somewhere between 3.2 and 3.4 and can't remember which.

comment:7 by Regina Obe <lr@…>, 3 months ago

In 94e382f0/git:

Add Jay Addison to credits
References #5666 for PostGIS 3.5.0

comment:8 by Regina Obe <lr@…>, 3 months ago

Resolution: fixed
Status: newclosed

In af10c4a/git:

Apply extension-upgrade-sql-reproducible.patch from trac ticket #5666

This patch is intended to remove a source of non-deterministic build
output, by reading the SOURCE_DATE_EPOCH[1] build-time-indicator
variable and using the resulting value instead of the build-host's
clock as a timestamp in extension upgrade script comments.

The approach here borrows from a similar pattern[2] used in the
'configure.ac' script.

[1] - https://reproducible-builds.org/docs/source-date-epoch/

[2] - https://git.osgeo.org/gitea/postgis/postgis/src/commit/0418cc0ddfda973ba10aa8cc981b07f25a464bdb/configure.ac#L1120

Signed-off-by: jayaddison <jay+osgeo@…>

Closes #5666 for PostGIS 3.4.3

comment:9 by robe, 3 months ago

Milestone: PostGIS 3.4.3

comment:10 by robe, 3 months ago

Sorry forgot to ask how you wanted to be credited. I had put Jay Addison, but realize now you have James Addison to. Let me know if I should change to James Addison.

comment:11 by James Addison, 3 months ago

Ah, thanks for checking! Yep, a small preference for James Addison instead of Jay Addison - either is fine though.

comment:12 by Regina Obe <lr@…>, 3 months ago

In ce93454/git:

Fix James Addison credits. References #5666

Note: See TracTickets for help on using tickets.