Ticket #1476 (closed defect: fixed)

Opened 8 years ago

Last modified 6 years ago

Improper handling of Postgis tables with multiple geometry columns

Reported by: aaime@… Owned by: rouault
Priority: normal Milestone: 1.6.0
Component: OGR_SF Version: 1.4.0
Severity: minor Keywords: PostgreSQL PostGIS
Cc: warmerdam, ajolma

Description (last modified by mloskot) (diff)

OGR is not able to handle properly feature types with multiple geometries. In the specific case of a Postgis table with 4 geometry columns we do have the following output:

ogrinfo -so PG:"host=localhost port=5432 user=myuser
 password=mystery dbname=tiger2005fe" major_roads

INFO: Open of `PG:host=localhost port=5432 user=myuser password=mystery dbname=tiger2005fe' using driver `PostgreSQL' successful.

Layer name: major_roads
Geometry: Unknown (any)
Feature Count: 105628
Extent: (-170.837387, -14.377902) - (144.914001, 66.922287)
Layer SRS WKT:
(unknown)
Geometry Column = gen_full
fid: Integer (0.0)
interstate: Integer (0.0)
othername: String (0.0)
state: String (0.0)
statehighway: Integer (0.0)
ushighway: Integer (0.0)

Layer name: major_roads
Geometry: Unknown (any)
Feature Count: 105628
Extent: (-170.837387, -14.377902) - (144.914001, 66.922287)
Layer SRS WKT:
(unknown)
Geometry Column = gen_full
fid: Integer (0.0)
interstate: Integer (0.0)
othername: String (0.0)
state: String (0.0)
statehighway: Integer (0.0)
ushighway: Integer (0.0)

Layer name: major_roads
Geometry: Unknown (any)
Feature Count: 105628
Extent: (-170.837387, -14.377902) - (144.914001, 66.922287)
Layer SRS WKT:
(unknown)
Geometry Column = gen_full
fid: Integer (0.0)
interstate: Integer (0.0)
othername: String (0.0)
state: String (0.0)
statehighway: Integer (0.0)
ushighway: Integer (0.0)

Layer name: major_roads
Geometry: Unknown (any)
Feature Count: 105628
Extent: (-170.837387, -14.377902) - (144.914001, 66.922287)
Layer SRS WKT:
(unknown)
Geometry Column = gen_full
fid: Integer (0.0)
interstate: Integer (0.0)
othername: String (0.0)
state: String (0.0)
statehighway: Integer (0.0)
ushighway: Integer (0.0)

whilst the table definition is:

tiger2005fe=# \d major_roads
                             Table "public.major_roads"
    Column    |   Type   |                         Modifiers
--------------+----------+-----------------------------------------------------------
 state        | text     |
 gen_full     | geometry |
 gen_1        | geometry |
 gen_2        | geometry |
 gen_3        | geometry |
 interstate   | integer  |
 ushighway    | integer  |
 statehighway | integer  |
 othername    | text     |
 fid          | integer  | not null default nextval('major_roads_fid_seq'::regclass)
Indexes:
    "fid_pkey" PRIMARY KEY, btree (fid)
    "major_roads_spatial_ind" gist (gen_full)
    "major_roads_spatial_ind1" gist (gen_1)
    "major_roads_spatial_ind2" gist (gen_2)
    "major_roads_spatial_ind3" gist (gen_3)

Attachments

pg_columns.diff Download (13.2 KB) - added by ajolma 6 years ago.
A quick and dirty patch to add column names to pg table layers

Change History

Changed 8 years ago by warmerdam

Assigning to Mateusz.

Please discuss any planned solution with me.  If it is not very disruptive 
if would be nice to fix things in 1.4 branch, but not if the changes are
very substantial.

Changed 8 years ago by mloskot

Frank,

Current data model used by OGR is based on assumption that cardinality of the 
relation between a feature and associated geometry is 1:1.

feature -|----------|- geometry

If we want, in OGR, to support PostGIS ability to handle multiple geometry
columns will likely change the cardinality to 1:N

feature -|----------|< geometry

This change of relation cardinality will require substantial changes to
interface of some of OGR classes (as well as corresponding OGR C API).

Currently, OGRFeature provides access to single instance of associated geometry.
This will need to be changed to provide access to collection of geometries.

Here is list of public interfaces that will require changes to support
multiple geometries per feature:

1.
OGRFeature::GetGeometryRef
OGRFeature::StealGeometry
OGRFeature::SetGeometryDirectly
OGRFeature::SetGeometry

- access to geometries by index
- access default geometry (ie. with index 0, first column)

2. 
OGRFeatureDefn::GetGeomType
OGRFeatureDefn::SetGeomType

- specify type of which geometry is requested/set

3. 
OGRLayer::GetGeometryColumn

- return geometry by index, with default index = 0

4.
OGRLayer::GetSpatialRef

- specify SRS of which geometry is requested


Here are operations of OGRLayer that might require some changes:

OGRLayer::GetSpatialFilter
OGRLayer::SetSpatialFilter
OGRLayer::SetSpatialFilterRect

- these 3 operations listed above might require changes of their semantic:
-- how to specify to which geometry a filter should be applied?
-- do we want to provide all-by-one filtering where a common filter is
   applied to all geometries?


Certainly, this is a very first analysis and it is only supposed to help
us to decide if we want to support multiple geometries sooner or later
(I assume we need to support it at some time).

Changed 8 years ago by warmerdam

Mateusz, 

In the short to medium term we can't practically reengineer OGR to support
multiple geometry columns properly.  What we might consider are special ways 
of selecting the geometry column to use.

I can quickly imagine 3 scenarios that might apply. 

1) We actually offer distinct appearing layers for each geometry column.  So 
if the table roads has two geometry columns, one called centerline, and one
called boundaries we might show the application a layer called
roads_centerline, and roads_boundaries.  

2) We could provide a mechanism in the datasource connection string where
a particular geometry column could be selected for a given table.  For
instance I think we already support listing the tables to be represented
as layers in the PG driver connection string.  We could allow adding a
geometry column in brackets after the table name as a way of selecting the
geometry column to use.  If unspecified, the first geometry column would
be used. 

3) We could extend the OGR API with a way of listing the geometry columns
available for a layer, and a method for selecting a particular one.  Note,
we already have the informative GetGeometryColumn() method which for some
RDBMS providers returns the name of the currently used geometry column. 

I'm inclined to pursue option (1) in the postgis driver for the time being
as it requires no changes to public apis and applications.  I think option 
(3) should be a longer term goal. 

Changed 7 years ago by warmerdam

  • priority changed from highest to normal
  • severity changed from blocker to minor
  • description modified (diff)
  • milestone set to 1.5.0

Re-prioritizing. Target 1.5.0.

Changed 7 years ago by warmerdam

  • cc warmerdam added

Changed 7 years ago by mloskot

  • status changed from new to assigned
  • description modified (diff)

Changed 7 years ago by mloskot

  • keywords PostgreSQL PostGIS added

Changed 7 years ago by warmerdam

  • milestone changed from 1.5.0 to 1.6.0

Changed 6 years ago by ajolma

  • cc ajolma added

Changed 6 years ago by ajolma

I guess my suggestion is to add column name to the constructor of a OGRPgTableLayer. It could be NULL, and then the behavior would be the same as now, but if it is not NULL, it would be used, maybe with some caution. The PGDataSource::Open could pick up the column names if possible (when it has geometry_columns available) and use those in OpenTable?. PGDataSource::GetLayerByName? could detect a column name in its argument.

QGIS uses, at least in its add postgis table interface, the syntax table_name(column_name). Maybe similar syntax could be used in PGDataSource::GetLayerByName?.

Changed 6 years ago by ajolma

A quick and dirty patch to add column names to pg table layers

Changed 6 years ago by rouault

  • owner changed from mloskot to rouault
  • status changed from assigned to new

I'm working on Ari's patch to make it work in a fully backward compatible way

The idea is that if a table 'foo' has only one geometry column (or zero), we continue to report it as 'foo'. If 'foo' has the default 'wkb_geometry' column and a new one called 'new_geometry_column', then 2 layers will be reported : 'foo' and 'foo(new_geometry_column)'.

Changed 6 years ago by rouault

  • status changed from new to closed
  • resolution set to fixed

Support added in r15721, test added in r15722

Changed 6 years ago by ajolma

  • status changed from closed to reopened
  • resolution fixed deleted

I think OGRPGTableLayer::GetSpatialRef? is failing in the case where a table has a wkb_geometry field and other geometry fields. Then the select srid from geometry_columns does not return unique value because now f_geometry_column is not specified. Maybe add

osCommand += CPLString().Printf(" AND f_geometry_column = 'wkb_geometry'", pszGeomColumn);

as the default case?

I can't test just now but I'm observing this.

Changed 6 years ago by ajolma

One more thing: "DELETE FROM geometry_columns.." in OGRPGDataSource::CreateLayer? is a bit too brutal now that we understand multiple geometry columns.

Changed 6 years ago by warmerdam

Ari,

I'm not sure why you think the:

"DELETE FROM geometry_columns WHERE f_table_name = '%s' AND f_table_schema = '%s'"

is inappropriate. If we are creating a new table we don't want any old entries in the geometry_columns from past incarnations of that table. Likewise, DeleteLayer?() needs to wipe all the entries in geometry_columns. It occurs to me that the existing DropGeometryColumn?() invocation may not be sufficient for tables with multiple geometry column entries. I would say that DeleteLayer?() is not brutal enough.

Changed 6 years ago by rouault

For OGRPGTableLayer::GetSpatialRef?(), I think we'd need an example where it fails because if pszGeomColumn == NULL, it means that the table had only one geometry column (see the end of OGRPGDataSource::Open).

For OGRPGDataSource::CreateLayer?() and DeleteLayer?(), I generally concur with Frank's analysis. However the semantics of CreateLayer?() and DeleteLayer?() on a table with multiple geometry columns is not clear because when you delete layer "foo(bar)" (or override it), you also indirectly delete "foo(baz)"... I suppose that multiple geometry column support should be considered as a read-only feature for the moment.

Changed 6 years ago by ajolma

The first report (table with a wkb_geometry and an another geometry column) is invalid. Perhaps I was using an incomplete build or something, but testing now on a different build the problem does not exist. Sorry.

The second point is a genuine problem I believe. Assume you create a new layer -- note: a layer is not necessarily a table -- from a table which already defines a layer. In that case you don't want to remove the existing layer. I'd say that the semantics issue arises from confusion between a layer and a table. A table with multiple geometries defines multiple layers and thus multiple entries in geometry_columns.

I suggest removing the DELETE FROM... command from CreateLayer? and adding a column name constraint to SELECT DropGeometryColumn?.. in DeleteLayer?. But I assume this needs more thought and testing.

Interestingly this issue exists also in other FOSS4G:

 http://jira.codehaus.org/browse/UDIG-1134  https://lists.berlios.de/pipermail/mapnik-users/2008-May/000902.html

Changed 6 years ago by warmerdam

Ari,

I agree there may be differing reasonable interpretations of whether deleting a layer means actually destroying the table, or just cleaning away the one geometry column if there are others.

However, I lean towards destroying a layer means destroying the table, and ideally cleaning up any other layers also dependent on that table. Our using two layers to represent a table with two geometry columns is a work around for OGR's inability to handle multiple geometry properties for features. I don't think we should overcomplicate this. But nevertheless, the issue is not entirely clear.

Changed 6 years ago by ajolma

  • status changed from reopened to closed
  • resolution set to fixed

This issue will remain a bit open since OGR will not be able to handle features with multiple geometry attributes any time soon. Such features are, by the way, still "simple features" in the OGC sense (I hadn't realized this).

It is perfectly ok to use multiple rows in geometry_columns for a single table. I found such usage in OGC specifications. The users - or middleware - just need to exercise care when creating new layers or deleting layers with the Pg driver.

I'm closing this as it was me who reopened it.

Note: See TracTickets for help on using tickets.