Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#7032 closed defect (fixed)

PostgreSQL primary key sequence is not updated when -preserve_fid is used

Reported by: cmartinez Owned by: warmerdam
Priority: normal Milestone: 2.3.0
Component: default Version: unspecified
Severity: normal Keywords:
Cc:

Description (last modified by cmartinez)

When loading a layer on PostGIS using ogr2ogr, a sequence is created for the serial primary key. If -preserve_fid paramenter is provided, the sequence start is not properly initialized (i. e. sequence starts on 1). This leads to problems if the layer is later edited.

Note that I am using GDAL version 1.11.3.

For instance, using --preserve_fid

ogr2ogr -preserve_fid -update -f PostgreSQL "PG:host='localhost' dbname='test'" myshp.shp  myshp -nln "test1"

We get the following sequence status:

# SELECT last_value from test1_ogc_fid_seq ;
 last_value
------------
          1
(1 row)

Instead, if we don't use -preserve_fid, the sequence is properly initialized to 102:

# Not using -preserve_fid: sequence properly initialized to 102
ogr2ogr -update -f PostgreSQL "PG:host='localhost' dbname='test'" myshp.shp  myshp  -nln "test2"
# SELECT last_value from test2_ogc_fid_seq ;
 last_value
------------
        102
(1 row)

What are the problems you face when editing such layer?

If you later try to add a new record to the layer without providing an id, it will raise an error, defeating the purpose of having a serial PK. For instance:

INSERT INTO test1 (wkb_geometry)
VALUES (ST_GeomFromText('MULTIPOINT(-5.34510248270791 36.1506091987899)', 4258)) ;

ERROR: duplicate key value violates unique constraint "test1_pkey"
SQL state: 23505
Detail: Key (ogc_fid)=(1) already exists.

They you must provide the ogc_fid by hand:

INSERT INTO test1 (ogc_fid, wkb_geometry)
VALUES (102, ST_GeomFromText('MULTIPOINT(-5.34510248270791 36.1506091987899)', 4258)) ;

Do you have an alternative suggestion about what to do for the sequence?

The sequence should be initialized in the same way as it is done when -preserve_fid is not provided. For the layer & sequence in the example, the following SQL statement would work:

SELECT set_val('test1_ogc_fid_seq', max(ogc_fid)) FROM test1 ;

Change History (8)

comment:1 by cmartinez, 7 years ago

Description: modified (diff)

comment:2 by Jukka Rahkonen, 7 years ago

I believe that sequence is already initialized in the same way. The difference is that when inserts are made with "preserve_fid" the sequnce does not advance. If you want to study the code it seems to be in https://trac.osgeo.org/gdal/browser/trunk/gdal/ogr/ogrsf_frmts/pg/ogrpgtablelayer.cpp.

You suggest that after the transaction the sequence should be set into max(fid). Or at least when max(fid) is bigger than current sequence value for handling deletes right.

Suggestion may make sense but I am not totally sure. If user creates the table with "preserve_fid" they somehow say that they know better what the fid value should be. Why don't they know the fid then when they add new data?

I can imagine that data comes ready from a third party or from an existing system and fid are keys to other valueble data. That't why to use preserve_fid initially. And then user wants to continue the work with QGIS or something. I am not sure if it is a good plan, perhaps it would be better to have a normal, unique column for attributes which are needed for foreign keys.

But yeah, I would also be puzzled by the "duplicate key value violates unique constraint "test1_pkey"" error.

comment:3 by Even Rouault, 7 years ago

The sequence is always initialized to 1. It is only advanced when doing insertion without explicit value for the PK column.

I think it's reasonable on layer creation that the sequence is advanced to the max(ogc_fid), so that later insertions work. Note that with GDAL 2.something, the -preserve_fid behaviour is the default when the source layer has itself a FID column

comment:4 by Jukka Rahkonen, 7 years ago

It may not be enough to limit it to layer creation.

I started to think that if -preserve_fid is the default and it lets the fid sequence to 1, then how new inserts with -append are handled at the moment. And actually they are not handled well.

I made table "fidtest" for testing and tried to append same data again but it seems to be impossible.

ogr2ogr -append -update -f postgresql PG:"dbname=gis host=localhost port=5432 user=user password=password" -nln fidtest states.shp -nlt  promote_to_multi

ERROR 1: ERROR:  duplicate key value violates unique constraint "fidtest_pkey"
DETAIL:  Key (ogc_fid)=(2) already exists.

If user appends data with -preserve_fid it should success if fids are unique but the sequence is still left behind and cause troubles later.

So perhaps the last thing after each insert transaction would be to advance the primary key sequence to max(fid). And perhaps also after each update because fids can be updated as well.

ogrinfo PG:"dbname=gis host=localhost port=5432 user=user password=password" -sql "update fidtest set ogc_fid=1000 where ogc_fid=1"

comment:5 by Even Rouault, 7 years ago

Resolution: fixed
Status: newclosed

In 40043:

PG/PGDump: make sure serial sequence is updated at layer closing/end-of-copy if we inserted features with fixed ids (fixes #7032)

comment:6 by Even Rouault, 7 years ago

Milestone: 2.3.0

I've committed a trunk an improvement that should fix the various use cases exposed here. Given the non-triviality of the PG driver, I prefer to refrain from backporting to the 2.2 branch.

Version 0, edited 7 years ago by Even Rouault (next)

comment:7 by Even Rouault, 7 years ago

In 40045:

PG: fix int vs bool isues (refs #7032)

comment:8 by Even Rouault, 7 years ago

In 40049:

PGDump: fix use after free issue introduced per refs #7032

Note: See TracTickets for help on using tickets.