Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#577 closed defect (invalid)

possible memory leak with buffer()

Reported by: davideps Owned by: robe
Priority: medium Milestone:
Component: postgis Version: 1.3.X
Keywords: memory buffer intersect dwithin Cc: pramsey@…

Description

VERSIONS PostgrSQL 8.3.11 on x86_64-pc-linux-gnu postgis_version() = 1.3 USE_GEOS=1 USE_PROJ=1 USE_STATS=1

DATA Table "parcels09_d3_v2" has about 320,000 polygon rows.

SYMPTOMS The original query ran for 2.5 hours before being terminated manually. It used 3.8gb of RAM & 1gb of swap. The modified query ran for 23 minutes and used about 600mb of memory. Memory estimates are from simply looking at the system monitor, so include basic operating system (UBUNTU) also—but nothing else was running at the time.

QUESTION: why did using buffer & intersect use so much more memory instead of just running more slowly?

—ORIGINAL QUERY CREATE TABLE bad_housing_neighbors AS SELECT p1.ogc_fid, avg(case when p2.vod='1' or p2.fire='1' or p2.condition='3' or p2.condition='4' then 1 else 0 end) FROM parcels09_d3_v2 as p1, parcels09_d3_v2 as p2 WHERE st_intersects(buffer(p1.wkb_geometry,528), p2.wkb_geometry) GROUP BY p1.ogc_fid ORDER BY p1.ogc_fid

—MODIFIED QUERY CREATE TABLE bad_housing_neighbors_528ft AS SELECT p1.ogc_fid, count(p2.ogc_fid) as num_neigh_528ft, sum(case when p2.vod='1' or p2.fire='1' or p2.condition='3' or p2.condition='4' then 1 else 0 end) as bad_cond_528ft,

avg(case when p2.vod='1' or p2.fire='1' or p2.condition='3' or p2.condition='4' then 1 else 0 end) as prc_bad_cond_528ft

—ok to include "self" (same ogc_fid) in calculation FROM parcels09_d3_v2 as p1, parcels09_d3_v2 as p2 WHERE st_dwithin(p1.wkb_geometry,p2.wkb_geometry,528) GROUP BY p1.ogc_fid ORDER BY p1.ogc_fid

Change History (6)

comment:1 by robe, 14 years ago

Owner: changed from pramsey to robe

David,

I don't think this is a bug, and if it is its probably in an already patched GEOS. Before I dismiss this as a non-issue, can you tell us which version of GEOS you are running? The below query will tell you.

SELECT postgis_full_version();

Secondly doing a buffer takes a lot of memory because it has to create a derivative geometry. The more points in your geometry the more memory it will take.

Thirdly there have been memory leaks in prior GEOS versions that have since been patched. Some having to do with buffer. If you are running GEOS below 3.2, you might want to upgrade your GEOS.

Forthly do you have spatial indexes in place as Nicklas had mentioned? 22 minutes is still quite a bit of time, though may be valid for your query set.

Thanks, Regina

comment:2 by robe, 14 years ago

Forgot to mention that ST_DWithin does not need to create a derivative geometry since it uses distance and short-circuit distance calcs. This is why the memory of that would be much less for what you are doing.

comment:3 by davideps, 14 years ago

Hi Regina,

VERSION INFO:

"POSTGIS="1.3.3" GEOS="3.1.1-CAPI-1.6.0" PROJ="Rel. 4.7.1, 23 September 2009" USE_STATS"

Do I upgrade GEOS separately or do I need to upgrade postgis as well?

I have a GIST index on the geometry. How quickly would you expect a decent laptop to calculate dwithin on 320,000 polygons? It is difficult for me to judge performance.

thank you, -david

comment:4 by robe, 14 years ago

Resolution: invalid
Status: newclosed

David,

Those particular questions should be asked on PostGIS users list since they are not bugs.

For upgrading GEOS 3.1 is more or less fine when using 1.3. I don't think you'll gain much. In theory you can get away with just upgrading GEOS without PostGIS as long as your symlinks are appropriately set (since GEOS 3.2 (superset) is a drop in replace for 3.1)

As with most things, it depends. It depends on the complexity of your geometries, the area of the polygons, how close they are together etc. It could be anywhere from milliseconds to minutes depending on those factors and not knowing the details we can't give you an answer.

I've run queries (as most people have) on much larger polygon sets and they finish in a fraction of a second (then again the number of records in those cases is probably about 100-1000 for results or so and I chop up my polygons if they get too large or simplify them so they have fewer points when speed becomes an issue)

You really need to do an EXPLAIN (faster but less detailed) or an EXPLAIN ANALYZE (takes about same time as your query minus time to output, but gives more detail) your query.

comment:5 by nicklas, 14 years ago

I just want to give a few comments even if the ticket is dead.

David

Your query doesn't call st_dwithin only 320000 times. Because you test every possible combination of polygons you actually calls the function 320000*320000 times. So it might be understandable. But it sounds quite long anyway.

PostGIS 1.5 should handle this better too. Especially if the polygons is a little bit complex with many vertxes upgrading might give quite a big difference.

comment:6 by davideps, 14 years ago

Nicklas,

Thank you for your comments. EXPLAIN shows p1 is a seq scan & p2 uses gist index. I will try to upgrade when I have time.

Note: See TracTickets for help on using tickets.