Opened 16 years ago

Last modified 16 years ago

#40 closed defect (wontfix)

ST_Union on nationalatlas.gov state boundary geometries loses details

Reported by: da...@… Owned by:
Priority: medium Milestone:
Component: postgis Version:
Keywords: Cc:

Description

What steps will reproduce the problem?

  1. Load the shapefile here into a table 'atlas_states':

http://edcftp.cr.usgs.gov/pub/data/nationalatlas/statesp020.tar.gz. The dataset has multiple MultiPolygons for each state.

  1. Create a new table with 'create table my_states(id serial primary key,

name varchar(50)); select addGeometryColumn('public','my_states','the_geom',4269,'MULTIPOLYGON',2);'

  1. Group by state and ST_Union the geometries to create single geometry

records for each state. 'insert into my_states(name, the_geom) select state, ST_Multi(ST_Union(the_geom)) as singlegeom from atlas_states where state = 'Washington' group by state'.

What is the expected output? What do you see instead?

I expect a single row with the geometry of Washington state correctly preserved. Instead, the geometry of Puget Sound and the Strait of Juan de Fuca are missing. Great Lake borders are also missing in the resulting geometries. Also, performing this operation on all 53 distinct states takes about 10 minutes on Core 2 Duo processor.

What version of the product are you using? On what operating system? 1.3.3 Debian Lenny/Sid PostgreSQL 8.3.1

Please provide any additional information below.

Original mailing list exchange: http://postgis.refractions.net/pipermail/postgis-users/2008-June/020314.html

Change History (6)

comment:1 by kneufeld, 16 years ago

I don't see any problem here. Both JTS in JUMP and GEOS in PostGIS report the same thing: a single multipolygon, the result of unioning the input 50 polygon attributed with 'Washington'. I can see no feature of the 50 that is missing.

As a test to see which of the 50 are reportedly missing: select a.gid from atlas_states a, my_states b where a.state = 'Washington' and not st_contains(b.the_geom, st_pointonsurface(a.the_geom));

gid


(0 rows)

A note on the performance: A single query in PostgreSQL will only ever use a single CPU. Even if you have a dual quad core, only a single CPU will be dedicated to processing the query … currently (hopefully Postgres will change this some day:) )

If you want to process things faster, split your dataset into two and run both at the same time, thereby utilizing both CPUs.

(This was tested on PostGIS1.3.3 and GEOS3.1.0-CAPI-1.5.0)

comment:2 by da...@…, 16 years ago

I'm attaching a screenshot to show what I'm seeing. Puget Sound is gone, only the outer border of Washington state can be seen.

I mention performance because the ST_union query was *so* much slower than using ST_dump and ST_collect (10 minutes versus 5 seconds) to achieve the desired result that there may be something truly wrong occurring during ST_union processing.

I suppose the other possibility is that QGis is having problems displaying the new geometry, however no errors are reported it doesn't have any trouble with the geometries constructed from ST_dump/ST_collect.

comment:3 by kneufeld, 16 years ago

This is exactly what you should be seeing.

Puget Sound is a polygon labeled 'Washington'. The land-based polygon next to it is also labeled 'Washington'. Your query is taking all polygons labeled 'Washington' (Puget Sound and the 49 others) and dissolving them together. When you dissolve (aka ST_Union) all polygons labeled 'Washington' together, the dividing line between the Puget Sound polygon and the polygon next to it disappear. This is no different than merging two adjacent land-based polygons together that share the same attribute.

If you don't want Puget Sound to get merged with the others of the same attribute, exclude it from your input query:

select state, ST_Multi(ST_Union(the_geom)) from atlas_states where state = 'Washington' and label != 'Puget Sound' group by state

I hope this clarifies things, Cheers, Kevin

comment:4 by da...@…, 16 years ago

Thank you for the explanation Kevin.

I guess then that my ST_dump/ST_Collect solution is doing what I want because it isn't trying to merge polygons in the sense of dissolving lines, but representing the full diversity of data in a single multipolygon.

Best, David

comment:5 by kneufeld, 16 years ago

<i>(No comment was entered for this change.)</i>

Note: See TracTickets for help on using tickets.