Opened 10 years ago

Closed 7 years ago

#2951 closed enhancement (fixed)

ST_Centroid for geography

Reported by: mwtoews Owned by: robe
Priority: medium Milestone: PostGIS 2.4.0
Component: liblwgeom Version:
Keywords: Cc:

Description

The idea is to determine the centroid of a geography with this signature:

CREATE OR REPLACE FUNCTION ST_Centroid(geog geography, use_spheroid boolean DEFAULT true)
  RETURNS geography AS
'$libdir/postgis-2.2', 'geography_centroid'
  LANGUAGE c IMMUTABLE STRICT
  COST 100;

Currently, GEOS computes the geometry centroid, so some of this would need to be recreated in liblwgeom using lengths and areas on either a sphere or spheroid. This enhancement would require a reliable geodesic area calculation using GeographicLib (#2918).

Single or multipart Point, Line, and Polygon geography objects will be considered.

Attachments (4)

centroid_meter_projection.png (8.8 KB ) - added by robe 7 years ago.
line curve example
geography_centroid.patch (17.5 KB ) - added by dangerground 7 years ago.
geog_cent_poly_prob.sql (47.3 KB ) - added by robe 7 years ago.
geog_cent_poly_prob problem geometries
geography_centroid_fix.patch (1.2 KB ) - added by dangerground 7 years ago.
fix polygon calculation

Download all attachments as: .zip

Change History (26)

comment:1 by robe, 10 years ago

Milestone: PostGIS FuturePostGIS 2.2.0

comment:2 by pramsey, 10 years ago

Milestone: PostGIS 2.2.0PostGIS Future

comment:3 by mwtoews, 9 years ago

I have an idea for this:

  1. Get the bounding box for the geography
  2. Make an initial guess of the centroid using the ½ distance along one of the diagonal lines from the bbox
  3. Project the data to a dynamic equal-area projection that is centred at the centroid guess
  4. Calculate the Cartesian centroid of the projected geometry
  5. Compare the distance between the distance estimated in 2. and 4., and if greater then a convergence criterion, then repeat from 3. using a new estimate

A few issues with this approach would be that I'm not sure exactly which projection would be best; +proj=laea? The iterative process would build projection strings for PROJ, which will need to be robust. Furthermore, I'm not sure how the convergence criteria would work, or how many projections would be required, as this could get expensive.

comment:4 by dangerground, 7 years ago

I implemented a version of ST_Centroid based on intermediate conversion to 3D-cartesian coordinates. Tests included.

comment:5 by robe, 7 years ago

Milestone: PostGIS FuturePostGIS 2.4.0
Owner: changed from pramsey to robe

comment:6 by robe, 7 years ago

Sorry the example for docs you followed sadly doesn't conform to the template.xml spec. I'll have to fix those.

Can you change so it is picked up by our what's new xsl script

New in 2.4.0

to

Availability: 2.4.0

More importantly doesn't compile for me - get these errors.

MING64~1/projects/POSTGR~1/rel/PG10W6~2/include/internal -I./src/include/port/win32 -DEXEC_BACKEND -I/projects/libxml/rel-libxml2-2.7.8w64gcc48/include/libxml2   -IC:/MING64~1/projects/POSTGR~1/rel/PG10W6~2/include/server/port/win32  -c -o geography_centroid.o geography_centroid.c
geography_centroid.c:1:0: warning: -fPIC ignored for target (all code is position independent) [enabled by default]
 /**********************************************************************
 ^
geography_centroid.c: In function 'geography_centroid':
geography_centroid.c:109:9: error: 'for' loop initial declarations are only allowed in C99 mode
         for (int i = 0; i < size; i++) {
         ^
geography_centroid.c:109:9: note: use option -std=c99 or -std=gnu99 to compile your code
geography_centroid.c: In function 'geography_centroid_from_wpoints':
geography_centroid.c:187:5: error: 'for' loop initial declarations are only allowed in C99 mode
     for (int i = 0; i < size; i++ )
     ^
geography_centroid.c: In function 'geography_centroid_from_mline':
geography_centroid.c:254:5: error: 'for' loop initial declarations are only allowed in C99 mode
     for (int i = 0; i < mline->ngeoms; i++) {
     ^
geography_centroid.c:261:14: error: redefinition of 'i'
     for (int i = 0; i < mline->ngeoms; i++) {
              ^
geography_centroid.c:254:14: note: previous definition of 'i' was here
     for (int i = 0; i < mline->ngeoms; i++) {
              ^
geography_centroid.c:261:5: error: 'for' loop initial declarations are only allowed in C99 mode
     for (int i = 0; i < mline->ngeoms; i++) {
     ^
geography_centroid.c:265:9: error: 'for' loop initial declarations are only allowed in C99 mode
         for (int i = 0; i < line->points->npoints - 1; i++) {
         ^
geography_centroid.c: In function 'geography_centroid_from_mpoly':
geography_centroid.c:306:5: error: 'for' loop initial declarations are only allowed in C99 mode
     for (int i = 0; i < mpoly->ngeoms; i++) {
     ^
geography_centroid.c:318:5: error: 'for' loop initial declarations are only allowed in C99 mode
     for (int ip = 0; ip < mpoly->ngeoms; ip++) {
     ^
geography_centroid.c:321:9: error: 'for' loop initial declarations are only allowed in C99 mode
         for (int ir = 0; ir < poly->nrings; ir++) {
         ^
geography_centroid.c:325:13: error: 'for' loop initial declarations are only allowed in C99 mode
             for (int i = 0; i < ring->npoints - 1; i++) {
             ^
<builtin>: recipe for target 'geography_centroid.o' failed
make[1]: *** [geography_centroid.o] Error 1
make[1]: Leaving directory '/projects/postgis/branches/2.4/postgis'
GNUmakefile:16: recipe for target 'all' failed

comment:7 by robe, 7 years ago

dangerground,

I'll do some tests with this and commit this week if it looks okay. Minor issue at moment with the docs because you have an extra def, but I can fix that.

Can you give me your full name as you'd like to appear in credits?

Thanks, Regina

comment:8 by robe, 7 years ago

okay get a crash with this:

CREATE TABLE crash_geog AS SELECT 'MULTIPOLYGON(((-71.137596 42.3487872,-71.1383097 42.3490151,-71.1382789 42.3490681,-71.1382501 42.3490589,-71.1382019 42.3491419,-71.1383391 42.3491856,-71.1383875 42.3491023,-71.1383548 42.3490918,-71.138382 42.3490449,-71.1386865 42.3491331,-71.1390201 42.3491748,-71.13933 42.3491671,-71.1396678 42.3491065,-71.1397578 42.3493047,-71.1396044 42.3493469,-71.1395729 42.3492775,-71.1396152 42.3492669,-71.1395796 42.3491887,-71.1394566 42.3492194,-71.1394924 42.3492982,-71.1395196 42.3492914,-71.1395503 42.3493589,-71.1393759 42.3493876,-71.1393659 42.3493398,-71.1393906 42.349337,-71.1393673 42.3492253,-71.1392714 42.3492362,-71.1392912 42.349332,-71.139319 42.3493289,-71.1393323 42.3493926,-71.1391527 42.3494039,-71.139152 42.3493581,-71.139185 42.3493579,-71.1391833 42.3492461,-71.1390796 42.349247,-71.1390812 42.3493579,-71.1391122 42.3493576,-71.1391129 42.3494045,-71.1389428 42.3493988,-71.1389508 42.3493271,-71.1389808 42.3493289,-71.13899 42.3492461,-71.1388876 42.3492397,-71.1388784 42.3493222,-71.1389117 42.3493242,-71.1389038 42.3493957,-71.1387259 42.3493725,-71.138733 42.3493419,-71.1387611 42.3493454,-71.1387873 42.3492323,-71.1386944 42.3492206,-71.1386685 42.3493329,-71.138693 42.349336,-71.1386861 42.3493653,-71.1384964 42.3493204,-71.1385299 42.349248,-71.138582 42.3492613,-71.138611 42.3491982,-71.1385178 42.3491684,-71.1384841 42.3492263,-71.1385087 42.3492341,-71.1384722 42.349297,-71.1380043 42.3491477,-71.1380401 42.349086,-71.1380635 42.3490934,-71.1381007 42.3490295,-71.1380334 42.349008,-71.1379706 42.3491162,-71.1380034 42.3491267,-71.1379583 42.3492045,-71.1374483 42.3490416,-71.137596 42.3487872),(-71.1378595 42.3489515,-71.1377924 42.3490673,-71.1378626 42.3490897,-71.1379298 42.3489739,-71.1378595 42.3489515),(-71.1376915 42.3488978,-71.1376243 42.3490136,-71.1376946 42.349036,-71.1377618 42.3489203,-71.1376915 42.3488978),(-71.1384173 42.3491847,-71.1383997 42.3492152,-71.1384353 42.3492266,-71.138453 42.349196,-71.1384173 42.3491847),(-71.1381277 42.3490922,-71.13811 42.3491228,-71.1381455 42.3491341,-71.1381633 42.3491036,-71.1381277 42.3490922)))'::geography AS geog;


SELECT ST_Centroid(geog)
FROM crash_geog;

backtrace shows:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 8460.0x1d80]
lwpoint_get_y (point=0xc051c8d73f5ce5f6) at lwpoint.c:76
76              if ( lwpoint_is_empty(point) )

regarding the docs,

you need to get rid of the function proto that has no use_spheroid and change the one with use_spheroid this:

        <funcprototype>
          <funcdef>geography <function>ST_Centroid</function></funcdef>

          <paramdef><type>geography </type>
		  <parameter>g1</parameter></paramdef>
          <paramdef choice="opt"><type>boolean </type>
		  <parameter>use_spheroid=true</parameter></paramdef>
	</funcprototype>

comment:9 by robe, 7 years ago

here is the bt where it comes from centroid:

[Switching to Thread 8460.0x1d80]
lwpoint_get_y (point=0xc051c8d73f5ce5f6) at lwpoint.c:76
76              if ( lwpoint_is_empty(point) )
(gdb) bt
#0  lwpoint_get_y (point=0xc051c8d73f5ce5f6) at lwpoint.c:76
#1  0x00000000633f90e0 in geography_centroid_from_mpoly ()
   from C:\ming64gcc48\projects\postgresql\rel\pg10w64gcc48\lib\postgis-2.4.dll
#2  0x00000000633f83ca in geography_centroid ()
   from C:\ming64gcc48\projects\postgresql\rel\pg10w64gcc48\lib\postgis-2.4.dll
#3  0x00000000005aa1ed in ExecInterpExpr (state=0x55998f0,
    econtext=0x55994a0, isnull=<optimized out>) at execExprInterp.c:672
#4  0x00000000005b3f77 in ExecEvalExprSwitchContext (isNull=0x2e3ec2f "",
    econtext=0x55994a0, state=0x55998f0)
    at ../../../src/include/executor/executor.h:290
#5  ExecProject (projInfo=0x55998e8)
    at ../../../src/include/executor/executor.h:324
#6  ExecScan (node=node@entry=0x5599388,
    accessMtd=accessMtd@entry=0x5cffa0 <SeqNext>,
    recheckMtd=recheckMtd@entry=0x5cff90 <SeqRecheck>) at execScan.c:201
#7  0x00000000005d0033 in ExecSeqScan (node=node@entry=0x5599388)
    at nodeSeqscan.c:130
#8  0x00000000005b2a78 in ExecProcNode (node=node@entry=0x5599388)
    at execProcnode.c:447
#9  0x00000000005adb63 in ExecutePlan (execute_once=<optimized out>,
    dest=0x5586590, direction=<optimized out>, numberTuples=0,

comment:10 by dangerground, 7 years ago

Thanks for testing, I fixed that segfault and replaced the prototype in the docs. Using "Danny Götte" as in the file header is fine for credits

comment:11 by robe, 7 years ago

Okay crash is gone, but something is not quite right about these calculations. Seems to work okay when I do buffers, answers are as expected.

But here is one that is glaringly wrong.

SELECT ST_AsText(ST_Centroid(geog))
FROM ST_GeogFromText('LINESTRING(-71.0846005 42.3971753,-71.0846078 42.3971502,-71.0847687 42.396707,-71.0847746 42.3966641,-71.0847638 42.3966281,-71.0847404 42.3965966,-71.0847056 42.3965679,-71.0846586 42.3965386)') As f(geog);

Output is:
POINT(-0.000955750185501768 0.00761666977447508)

note how the centroid is no where near the line. I would expect the answer to be similar to what I get when I transform to state plane meters to do the computation and then transform back.

SELECT ST_AsText(ST_Transform(ST_Centroid(ST_Transform(geog::geometry,26986)),4326))
FROM ST_GeogFromText('LINESTRING(-71.0846005 42.3971753,-71.0846078 42.3971502,-71.0847687 42.396707,-71.0847746 42.3966641,-71.0847638 42.3966281,-71.0847404 42.3965966,-71.0847056 42.3965679,-71.0846586 42.3965386)') As f(geog);

-- outputs
POINT(-71.084699611775 42.3968457764674)

I've attached screen shot.

by robe, 7 years ago

line curve example

comment:13 by robe, 7 years ago

Danny I'm assuming you are still troubleshooting this issue. We're planning to release a tagged 2.4.0 alpha end of this week. If you don't think you'll have this sorted out by then, what I could do is commit this as is now with the plan we'll have these issues sorted out before September when we need to do final release to catch the PostgreSQL 10 train.

If you think issue is more involved, then we could wait till 2.5 or I guess worsed case if we put it in now we can stub it with a geometry wrapper and have enhanced in PostGIS 2.4.1 since the SQL signature would not require changing.

Thanks, Regina

by dangerground, 7 years ago

Attachment: geography_centroid.patch added

comment:14 by robe, 7 years ago

In 15518:

ST_Centroid for geography (Danny Götte)
References #2951

comment:15 by robe, 7 years ago

Danny,

Much better, so I committed. I'm keeping this open since I think I'm still seeing something, but I might be mistaken cause the issue only shows when I have in table of records and not when I try to test the geometry as a single record. Also only shows for multipolygons at moment.

comment:16 by robe, 7 years ago

In 15519:

missed file commit for ST_Centroid Geography support
references #2951

comment:17 by robe, 7 years ago

In 15520:

Missed test commit for ST_Centroid geography support
References #2951

comment:18 by robe, 7 years ago

I think there is some memory pollution issue left at least with the multipolygon centroid checks. I also think the calc for multipolygon may be a little off, but I'd have to do further spot checks to be sure.

Attached is a table dump of geometries that centroid seems enough from what I would expect.

I'm testing this on PostgreSQL 10beta2 and haven't ruled out it's not a 10 specific bug.

The thing that most troubles me is the memory pollution.

SELECT id, ST_AsText(ST_Centroid(geog)) AS geog_cent, ST_AsText(ST_Transform(ST_Centroid(ST_Transform(geog::geometry,26986)),4326)) AS sm_cent, ST_Distance(ST_Transform(ST_Centroid(ST_Transform(geog::geometry,26986)),4326)::geography, ST_Centroid(geog)) As dist_disp
FROM geog_cent_poly_prob
WHERE id IN( 49);

yields this:

 id |                 geog_cent                 |                  sm_cent                  |    dist_disp
----+-------------------------------------------+-------------------------------------------+-----------------
 49 | POINT(-71.0758016600248 42.4106304684328) | POINT(-71.0757460620397 42.4107333642771) | 8466479.12566751
(1 row)

I'm guessing the first call to ST_Centroid yields the right answer (those the as text looks okay), but the second call where I stuff it in a distance check gives some huge number presumably because it computed something crazy for the centroid.

Now something different happens if I run my test with more than one record:

SELECT id, ST_AsText(ST_Centroid(geog)) AS geog_cent, ST_AsText(ST_Transform(ST_Centroid(ST_Transform(geog::geometry,26986)),4326)) AS sm_cent, ST_Distance(ST_Transform(ST_Centroid(ST_Transform(geog::geometry,26986)),4326)::geography, ST_Centroid(geog)) As dist_disp
FROM geog_cent_poly_prob;
 id |                 geog_cent                 |                  sm_cent                  |  dist_disp
----+-------------------------------------------+-------------------------------------------+--------------
  1 | POINT(-71.0703129986404 42.3452999176663) | POINT(-71.0703048387431 42.3453017868431) |    0.7037137
  5 | POINT(-71.037584398837 42.336384333772)   | POINT(-71.0428341568715 42.3388586422701) | 512.55960119
 49 | POINT(-71.0758046647449 42.4106309813902) | POINT(-71.0757460620397 42.4107333642771) |          NaN
(3 rows)

I get NaN. If I run in pgAdmin instead of PSSQL, I've seen geog_cent return a POINT(0…).

To prove it's the ST_Centroid,

SELECT id, ST_AsText(ST_Centroid(geog)) AS geog_cent,  ST_AsText(ST_Centroid(geog)) As geog_cent2
FROM geog_cent_poly_prob;
 id |                 geog_cent                 |                geog_cent2
----+-------------------------------------------+-------------------------------------------
  1 | POINT(-71.0703129986404 42.3452999176663) | POINT(-71.0703129986404 42.3452999176663)
  5 | POINT(-71.037584398837 42.336384333772)   | POINT(-71.037584398837 42.336384333772)
 49 | POINT(-71.1593947384227 42.3629042590461) | POINT(-71.0945591847874 42.3518693374157)
(3 rows)

Note how the same exact query ST_AsText(ST_Centroid(…)) gives two different answers for the id=49 case.

I've attached the sample table.

by robe, 7 years ago

Attachment: geog_cent_poly_prob.sql added

geog_cent_poly_prob problem geometries

comment:19 by robe, 7 years ago

BTW when I run in pgAdmin3 I get:

SELECT id, ST_AsText(ST_Centroid(geog)) AS geog_cent,  ST_AsText(ST_Centroid(geog)) As geog_cent2
FROM geog_cent_poly_prob
WHERE id IN( 49);
49	POINT(-1.51370977193642e-226 0)	; POINT(-1.51370977193642e-226 0)

so I suspect that's where that huge number is coming from when computing distance to computed based on MA state plane meters.

comment:20 by dangerground, 7 years ago

yes, there where unitialized points used for calculating the end result, which may have the results to be a "little" off or NaN, new patch file contains the fix as well as updated expected results

by dangerground, 7 years ago

fix polygon calculation

comment:21 by robe, 7 years ago

In 15526:

ST_Centroid for geography fix uninitialized points issue
References #2951

comment:22 by pramsey, 7 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.