Opened 14 years ago

Closed 12 years ago

#547 closed defect (fixed)

ST_contains memory problem when used between polygon and mixed geometry

Reported by: twain Owned by: strk
Priority: blocker Milestone: PostGIS 1.5.6
Component: postgis Version: 1.5.X
Keywords: ST_contains memory Cc:

Description

Tested with postgis 1.5.1:

The following query uses large amounts of memory (multiple GB depending on the number of rows)

select ST_Contains(source.geometry, placex.geometry) from placex, placex as source where source.place_id = 59001169 and source.geometry && placex.geometry;

where source.geometry is a polygon or multipolygon.

if the query is restricted to:

select ST_Contains(source.geometry, placex.geometry) from placex, placex as source where source.place_id = 59001169 and source.geometry && placex.geometry and ST_geometrytype(placex.geometry) = 'ST_Point';

OR

select ST_Contains(source.geometry, placex.geometry) from placex, placex as source where source.place_id = 59001169 and source.geometry && placex.geometry and ST_geometrytype(placex.geometry) != 'ST_Point';

neither of these queries has high memory usage.

In the original case all memory is freed when the query completed but given a large number of rows (10000+) it can use up all the memory in the machine causing it to swap and eventually kill the process.

As far as I can tell there is nothing special about the source polygons / multi polygon - any polygon shows this behaviour.

Attachments (1)

prepared-geom.patch (3.8 KB ) - added by pramsey 13 years ago.
Clean up incorrect cache objects

Download all attachments as: .zip

Change History (27)

comment:1 by pramsey, 14 years ago

I think I see where this could come from, as the prepared geometry cache system expects a query to be *either* a p-i-p situation or a p-and-g situation, but not both at the same time. One fix could be just ripping out the postgis-native implementation (for the p-i-p case) of the caching system.

by pramsey, 13 years ago

Attachment: prepared-geom.patch added

Clean up incorrect cache objects

comment:2 by pramsey, 13 years ago

If you're still there I'd be interested if the attached patch resolves the problem.

comment:3 by chodgson, 13 years ago

I don't have a test case for this, but I think it's clear that this patch should be applied. Without it, we're just leaking the cache objects whenever we encounter a different type of query.

Ideally, we could be even smarter, and either fit both cache objects into fn_extra (perhaps with a super-cache struct) or, make our processing dependent on what object with have cached. eg. If we are doing a point in poly, and we already have a geos struct, use geos instead of the postgis PIP code. So in the case of mixed geometry types, we would "upgrade" from the rtree cache to the prepared geos cache, but not downgrade (as this patch would do) - this only makes sense to do if the geos PIP using a prepared geom is comparable in speed to the builtin postgis PIP code (which I would think it should be?). Holding on to both cache objects should be pretty easy to do though.

comment:4 by chodgson, 13 years ago

Owner: changed from pramsey to chodgson

A quick test - ESRI's standard world shapefiles, tested running PIP on cities in countries, ~2500 cities and ~250 countries.

select st_intersects(co.the_geom, ci.the_geom) from country co, cities ci;

This takes 8 seconds with the rtree code in place. If I removed the rtree shortcut in the intersection function by replacing the "if (one is a point and the other is poly)" with "if(0)" - it takes 28 seconds. That is the second run timings and it doesn't get any different with more runs. This is enough of a difference that I think it is worth keeping both implementations around for now.

So I think I'd like to do the "super-cache" struct to hold both the geos prepared_geom and the rtree index in fn_extra.

comment:5 by robe, 13 years ago

Milestone: PostGIS 1.5.3PostGIS 1.5.4

comment:6 by pramsey, 12 years ago

chodgson, did this get done? I have memories of you committing work on a unified cache…

comment:7 by chodgson, 12 years ago

IIRC, I tried to apply Paul's original patch, and it didn't seem to fix my test case entirely. I didn't have enough time to try the super-cache struct approach, but I still think it is the "best" way.

comment:8 by strk, 12 years ago

Owner: changed from chodgson to strk
Status: newassigned

I'm working on the grand unified cache (for 2.0)

comment:9 by strk, 12 years ago

r8877 introduced an unified cache. I _belive_ it is safe to backport, but please give a word of encouragement for that :)

comment:10 by chodgson, 12 years ago

Strk, this looks about like what I was expecting, thanks for doing it. I can't see why it shouldn't be backported to 1.5 as that's where I was originally planning to fix it.

comment:11 by strk, 12 years ago

Resolution: fixed
Status: assignedclosed

r8888 fixes a missing return in trunk, r8887 backports the fixed version into 1.5

comment:12 by pramsey, 12 years ago

Resolution: fixed
Status: closedreopened

comment:13 by pramsey, 12 years ago

Priority: highblocker

There is a big memory leak which has been bisected back to r8877.

http://lists.refractions.net/pipermail/postgis-users/2012-February/032306.html

comment:14 by pramsey, 12 years ago

Milestone: PostGIS 1.5.4PostGIS 2.0.0

If the grand unified cache was introduced into 1.5 branch it'll need to be fixed there too (reading the comments I guess it was not).

comment:15 by strk, 12 years ago

Do you have a valgrind report for the leak ? By reading the rtree cache preparer it doesnt' look like having issues, but the GetPrepGeomCache function is pretty confusing to me.

comment:16 by strk, 12 years ago

Ok, I could reproduce by running ST_Intersects(A, A) OR ST_Intersects(A, P).

Interesting enough the leak remains in the A-A case, doesn't matter if A-P is encountered, so the problem is fully with the prepared geometry cache, not the RTREE one. Which is the confusing function, btw. This is with ST_Intersects though (as the reporter for r8877 regression uses), not ST_Contains (which does _not_ seem to leak for me).

comment:17 by pramsey, 12 years ago

The PrepGeom stuff is confusing because it has to account for the memory being allocated by GEOS outside the PgSQL memory pools and clean it up on shutdown. It does this with the crazy-making memory context stuff you're squinting at. The proj cache has the same stuff. The memory leak might be because your unified cache work has not taken those issues into account?

comment:18 by strk, 12 years ago

Could be. I'm working on making it easier to enable debugging in a single .c file, then I'll know more about what's going on. It strikes me that with that commit I was really careful about not changing things much.

comment:19 by strk, 12 years ago

Alright, here's what happens:

NOTICE:  [lwgeom_geos_prepared.c:GetPrepGeomCache:329] GetPrepGeomCache: creating cache: 0x7fc5735eddb0
NOTICE:  [lwgeom_geos_prepared.c:GetPrepGeomCache:338] GetPrepGeomCache: adding context to hash: 0x7fc5735eddb0
NOTICE:  [lwgeom_geos_prepared.c:GetPrepGeomCache:436] GetPrepGeomCache: copying pg_geom1 into cache

Over and over again.

comment:20 by strk, 12 years ago

Uhm, fn_extra keeps getting back as NULL during operations. Can't get a cache hit. Both with prepared and rtree caches

comment:21 by strk, 12 years ago

Resolution: fixed
Status: reopenedclosed

Silly one, fixed by r9136

comment:22 by RustyMuppet, 12 years ago

Milestone: PostGIS 2.0.0PostGIS 1.5.5
Resolution: fixed
Status: closedreopened

I had this problem using postgis 1.5.5 (postgresql 9.1.4) with st_intersects in the where clause.

Can we get this fix backported to 1.5 please.

Thanks Darren

comment:23 by strk, 12 years ago

The 1.5 branch is in horrible conditions at this very moment, see ticket #1954 — need a succeeding testsuite before adding anything can be committed (any commit in 1.5 needs careful checking)

comment:24 by strk, 12 years ago

Ok, could you please test r10186, Darren ?

comment:25 by strk, 12 years ago

Milestone: PostGIS 1.5.5PostGIS 1.5.6

comment:26 by strk, 12 years ago

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.