Opened 11 years ago

Closed 10 years ago

Last modified 7 years ago

#2385 closed defect (invalid)

ST_MakeLine causing out of memory error

Reported by: ris Owned by: pramsey
Priority: medium Milestone: PostGIS Fund Me
Component: postgis Version: 2.0.x
Keywords: ST_MakeLine memory aggregate Cc:

Description

The following snippet:

SELECT ST_MakeLine ( ST_Point ( a , b ) ) FROM generate_series ( 1 , 1000000 ) AS a , generate_series ( 1 , 10 ) AS b GROUP BY a;

uses a huge amount of memory.

This has been tested on Debian postgres 9.1.9-1/postgis 1.5.3-2 and also MacOS with postgres 9.2/postgis 2.0.

If you don't have massive amounts of memory this will continue until it fails with

ERROR:  out of memory
DETAIL:  Failed on request of size 28.

but not before filling the log with thousands of lines the likes of

accumArrayResult: 8192 total in 1 blocks; 7400 free (0 chunks); 792 used

along with a lot more diagnostic output I won't include here unless you want it as I'm 100% sure you'll be able to reproduce this.

On the surface this looks a lot like http://www.postgresql.org/message-id/E1Sgmoz-0002gU-No@wrigleys.postgresql.org

Change History (13)

comment:1 by robe, 11 years ago

I tested on my 9.3beta2 PostGIS 2.2 windows 7 32-bit instance fails with:

ERROR:  out of memory
DETAIL:  Failed on request of size 28.

and my 64-bit instance is still chugging after 30 minutes.

given your link I suspect it is nothing we can do about and needs to be solved upstream. I'll leave it to pramsey to make that judgement call.

comment:2 by robe, 11 years ago

Resolution: worksforme
Status: newclosed

Okay my windows 64-bit 2008 R2 PostGIS 2.1 production instance finished and IT DID NOT FAIL.

POSTGIS="2.1.0rc2 r11663" GEOS="3.4.0dev-CAPI-1.8.0 r0" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.10.0, released 2013/04/24" LIBXML="2.7.8" LIBJSON="UNKNOWN" TOPOLOGY RASTER PostgreSQL 9.2.4, compiled by Visual C++ build 1600, 64-bit

So given this, I'm dismissing this as a non-issue and propose you make sure you are running the latest PostgreSQL micros.

comment:3 by ris, 11 years ago

Resolution: worksforme
Status: closedreopened

I respectfully disagree.

Whether your machine actually runs out of memory or not, this is still a memory leak of some sort. One million 10-point linestrings is not that big a deal, and shouldn't be using gigs and gigs of memory.

Alternatively, I could re-submit the bug with an extra zero or two on the 1 million figure and see if peoples machines cope with it then.

comment:4 by robe, 11 years ago

Well I wasn't monitoring my memory utilization. So I can retest with that and check.

What micro versions of PostgreSQL are you running though? You only gave a minor, but a lot of these issues are fixed in micro releases?

comment:5 by ris, 11 years ago

Looking further into postgres bug 6698, it doesn't seem to have been fixed at all. (This investigation is not aided by postgres not having a real bug tracker)

So you may well be right and this could just be the same bug. I was assuming that a bug reported in 9.1.2 over a year ago would have had a fix make its way into 9.1.9 and so this would be a similar-but-not-the-same bug caused by ST_MakeLine doing things based on the way array() was doing things and thus picking up some of its bugs. But apparently not.

Not really sure what to do here - I might prod the postgres mailing list over 6698 but as this bug is surfacing in postgis it may not be welcome. But I feel at least the existence of this bug should be documented somewhere for now hence its existence here.

comment:6 by robe, 11 years ago

Milestone: PostGIS Future

comment:7 by pramsey, 11 years ago

Does abusing ST_Collect in a similar way lead to similar results? Question is whether the problem is in the accumulation stage of the aggregate or in the final function of the aggregate, and collect/makeline share accumulation code.

comment:8 by pramsey, 11 years ago

Also, to isolate whether it's a PgSQL subquery problem or a makeline problem turn your generate_series calls into a simple table and makeline on the table instead of the subquery.

comment:9 by pramsey, 11 years ago

My computer is really big and it doesn't crash on your example, but it does show a pretty big resident size on the process when the query in run. The thing it, this query show basically the same resident size, and doesn't have any spatial stuff in it at all:

create table foo as 
select count(*) FROM 
  generate_series ( 1 , 1000000 ) AS a, 
  generate_series ( 1 , 10 ) AS b 
GROUP BY a;

comment:10 by ris, 11 years ago

Hmm. That example doesn't fail for me though on my Debian/9.1.9/1.5.3 machine, even if I raise the 1m to 10m.

However, doing a

CREATE TABLE bar AS
SELECT * FROM
   generate_series ( 1 , 1000000 ) AS a ,
   generate_series ( 1 , 10 ) AS b;

SELECT ST_MakeLine ( ST_Point ( a , b ) ) FROM bar GROUP BY a;

does indeed raise the same out of memory error.

As does exchanging ST_MakeLine for ST_Collect in both cases, though in the generate_series example the error is "…size 16" and, interestingly, in the materialized table example I have to up it to 10m for it to fail and it instead gives me the slightly different

out of memory for query result

comment:11 by pramsey, 11 years ago

Well, forming a 10M point line from 10M points can be reasonably expected to take about 500Mb (about 250M for each set of 10M points, and you will have both the line and points in memory at the time you finish). And since you're crossing a relation of 10m with a relation of 10 in your increased example, you're looking at 5Gb, which seems legitimately out of memory, no leaks or bad practice required.

comment:12 by strk, 10 years ago

Resolution: invalid
Status: reopenedclosed

closing as invalid (not really a bug)

comment:13 by robe, 7 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

Note: See TracTickets for help on using tickets.