Opened 16 years ago

Closed 15 years ago

#179 closed defect (fixed)

ST_MakeLine and ST_MakeLine_Garry crash server with null arrays

Reported by: robe Owned by: pramsey
Priority: medium Milestone: PostGIS 1.4.1
Component: postgis Version: 1.4.X
Keywords: Cc:

Description

SELECT ST_MakeLine_GArray(ARRAY[NULL,NULL,NULL,NULL])

server crashes. I traced one of my clients constantly crashing server to this. Their building process sometimes emits this which then kills the Postgres backend thus requiring manual restart of their geoserver app.

Note that the new PostGIS 1.4 equivalent

SELECT ST_MakeLine(ARRAY[NULL,NULL,NULL,NULL]) achieves the same destruction.

If we have a 1.3.7 I guess we should target for that too.

I think this is a long standing issue since I tested on 1.3.5 (and actually theya re running 1.3.5) and was able to do this.

Change History (24)

comment:1 by nicklas, 16 years ago

This seems to be a problem in postgresql. postgresql8.4beta / postgis1.4 doesn't crash the server, just returns an empty cell. I tried "SELECT ST_MakeLine(ARRAY[NULL,NULL,NULL,NULL]);"

But the same crashes both 1.36 and 1.4 with postgresql8.3

I guess it is the file array.h that has been changed.

/Nicklas

comment:2 by nicklas, 16 years ago

yes, it have to be postgres because even select ARRAY[NULL,NULL,NULL,NULL]; crashes postgresql 8.3

but returns "{NULL,NULL,NULL,NULL}" in 8.4.

comment:3 by robe, 16 years ago

Resolution: wontfix
Status: newclosed

Nicklas, Which version of PostgreSQL 8.3 are you running? I suspect they must have introduced this bug in a newer version of 8.3.

I tried

ARRAY[NULL, NULL,NULL, NULL] on my 8.3 and didn't crash. Though I'm running an old one here 8.3.1

Returns {NULL,NULL,NULL,NULL}

ST_MakeLine(ARRAY[NULL, NULL,NULL, NULL])

Also doesn't crash either — so I guess I'll close this one out and report a PostgreSQL bug if there isn't one in place. I'm pretty sure the 8.3 I was testing this on was a 8.3.7

Thanks for hunting this down.

comment:4 by nicklas, 16 years ago

I tried on 8.3.6 and it crashed, but 8.4 didn't crash.

in reply to:  4 comment:5 by nicklas, 16 years ago

Replying to nicklas:

I tried on 8.3.6 and it crashed, but 8.4 didn't crash.

Sorry I was too fast again. even 8.3.6 returned "{NULL,NULL,NULL,NULL}" I had the same query-window open as from last crash so when I switched to 8.4 it worked but 8.3.6 returned error and no connection to the server.

Grr, I don't want to do more mistakes.

comment:6 by robe, 16 years ago

Okay I tried on an 8.3.7 here and it didn't crash (the array or the MakeLine) and doesn't seem to crash on my 8.3.1 either. So perhaps my Linux box is running 8.3.6. I'll have to double-check. I thought the other windows server I had tried it on it crashed - but that could be running 8.3.6.

comment:7 by nicklas, 16 years ago

I'm really confused here. Now it crashes my 8.4 and i'm sure it worked earlier. I've tried to understand the source-code, but I'm to bad in c-programming. But since I really want too learn, I would like to understand what's happening here.

I think the problem has to do with: if ( TYPE_GETTYPE(geom→type) != POINTTYPE ) continue; at line 2177 in lwgeom_functions_basic.

or more precise #define TYPE_GETTYPE(t) ((t)&0x0F) at line 620 in libwgeom.h

My reason for beliving this is that "select st_makeline(array[null,null])" now works and returns an empty cell and a notice "No points in input array" just as expected but "select st_makeline(array[null,null,null])" crashes the server.

I think something here is very unstable, but I don't understand anything of the line #define TYPE_GETTYPE(t) ((t)&0x0F), as mentioned above, so then the easiest way is to blame what I don't understand as always in life :-) Is there any other occasion when this TYPE_GETTYPE is feeded with calls this close after eachother where it would be possible to provoke it.

Regina, from my view you have to reopen the ticket, but I'm not sure of anything anymore.

/Nicklas

comment:8 by robe, 16 years ago

Resolution: wontfix
Status: closedreopened

Can't say I do either. C is definitely not one of my strengths. Okay I'm going to reopen this since it still seems unclear where the fault lies.

comment:9 by robe, 15 years ago

Milestone: postgis 1.4.0postgis 1.4.1

I'm pushing this to 1.4.1 since it does the same on 1.3.6 and its not clear where the issue lies since Nicklas and I were both getting unpredicatable behavior on similar PostgreSQL installs.

comment:10 by robe, 15 years ago

Milestone: postgis 1.4.1postgis 1.3.7

comment:11 by nicklas, 15 years ago

I have made a new ry to understand this. I think I know where the problem is.

I added a notice under the array-handler offset at line 2174 in lwgeom_functions_basic:

offset += INTALIGN(VARSIZE(geom));
lwnotice("offset%d",offset);
continue;

By studing the offset it showed that the NULL-values is counted as elements in the array in ndim of the array structure

but:

INTALIGN(VARSIZE(geom))

don't notice the NULL-values and just gets the size of the next element. This causes the loop to continue beyond the array in the memory when there is iterations to be done according to nelems but the actual geometries is already used. That makes the rest of the offsets randomly depending on what is comming after in memory.

I think Regina, that's why we didn't always had crashes because it did no harm if the memory after the array wasnot used.

I don't know how to handle it.

As I understand it we have to sort away the NULL-values before they reach that internal array-structure and gets counted. Is that a postgresql-thing. How do they handle it?

/Nicklas

comment:12 by robe, 15 years ago

Nicklas,

I'm pretty sure its the same problem as what Mark fixed for PostGIS 1.4 in ticket #210. He just needs to get his butt in gear and backport the fix to 1.3 or if you want to take a stab at backporting his changes, feel free to and then I can quickly verify the changes and apply the patch.

I'm not sure if the codebase is different enough that the same change may not work.

comment:13 by nicklas, 15 years ago

Regina

I think too that it is almost the same problem, but I'm not sure it is exactly the same. I don't think it is solved for any of the versions. At least trunk still crashes, but maybe not every time. The more Null-values the higher risk of crash.

But I can't say if Mark has just missed to include makeline in the solution or if it needs a different approach.

I found yesterday that Postgresql is using doxygen too, I didn't know. Is it possible to have links inthe the code from Postgis doxyen to postgresql to easier follow what's happening?

/Nicklas

comment:14 by mcayland, 15 years ago

As Regina points out, this should have been fixed in #210. If it still crashes on 1.4/trunk, please submit a test case and I'll take a look at it.

My feeling is that this patch is too invasive to backport to 1.3 - people would save themselves lots more phantom crashes on bad data by simply upgrading to 1.4.

ATB,

Mark.

comment:15 by robe, 15 years ago

Yes we can have links. Actually I think there may be a simpler solution to all of this. Why can't we mark the aggregates as STRICT. Not sure if that is backward compatible or not, but I have a feeling that is how this problem started when postgreSQL version started allowing nulls in aggregates — but I see they have a STRICT option which I believe works for 8.3 and 8.4 (8.2 and 8.1 I suspect are non-issues since I don't think the NULLs were ever allowed).

See this note extracted from PostgreSQL 8.4 create agg.

"If the state transition function is declared "strict", then it cannot be called with null inputs. With such a transition function, aggregate execution behaves as follows. Rows with any null input values are ignored (the function is not called and the previous state value is retained). If the initial state value is null, then at the first row with all-nonnull input values, the first argument value replaces the state value, and the transition function is invoked at subsequent rows with all-nonnull input values. This is handy for implementing aggregates like max. Note that this behavior is only available when state_data_type is the same as the first input_data_type. When these types are different, you must supply a nonnull initial condition or use a nonstrict transition function. "

http://www.postgresql.org/docs/8.4/interactive/sql-createaggregate.html

I would like to see this fixed in 1.3. Unfortuantely because of the new box3d_extent thing in 1.4, some of my apps break if I just convert over to 1.4 so I can't without retesting and some recoding. Which I am a bit in the process of doing.

comment:16 by robe, 15 years ago

Just a slight clarification on why 1.4 is at the moment not always a clean upgrade

Refer to this ticket I submitted #223

Actually I had 2 separate unrelated applications where I had to revise my stored procs to work with PostGIS 1.4. One of them was using ST_X,ST_Y as I recall) — I'll have to double-check how I had to change that. since like I said the fact that our postgis_upgrade is not completely right results in having to change different things depending on if you do a restore untop of a clean postgis 1.4 versus using the upgrade script we have. Sorry to bring this up in an unrelated ticket.

comment:17 by nicklas, 15 years ago

Mark
At least the trunk of today in windows crashes for me with

SELECT ST_MakeLine(ARRAY[NULL,NULL,NULL,NULL]);



Regina
about marking aggregates as STRICT: Will that effect this array-issue. Is arrays handled the same ways as aggregates over rowes?

/Nicklas

comment:18 by robe, 15 years ago

Nicklas, Ah indeed you are right. Its still broken in 1.4 and 1.5.

Well the funny thing is my 1.4.1SVN I can't get it to break, but I was able to get it to break on 1.4.0 RTM and my 1.5 compiled from 8/5. So maybe Mark fixed for the aggs (or forgot about ST_MakeLine array version)

Can you get it to break against 1.4 branch?

I presume I must have missed this because it doesn't always break. Just sometimes. and probably the times I was testing it it was behaving.

Regarding the note about STRICT. I don't think its the aggregate we mark as strict, but the state function. pgis_geometry_accum_transfn (for 1.4+) and for (1.3+) — st_geom_accum and that should unless I am missing something fix the aggregate issue.

To fix the ST_Unite… array version — we would mark that as STRICT as well. That however I think would just prevent people from using it with NULL arrays (or hmm arrays that contain NULL). Not sure which. Haven't tested to tel. It wouldn't be ideal, but I presume if 1.3 is a hard fix its better to get an error and the correct answer than to have a crashing server. For 1.3 those array functions are not documented. So would just be abusive people like me looking behind the hood using it. The world needs to be protected from people like me of course :).

comment:19 by nicklas, 15 years ago

Regina

Yes, this combination crashes 3 times out of 3 for me

"POSTGIS="1.4.1SVN" GEOS="3.2.0-CAPI-1.6.0" PROJ="Rel. 4.6.1, 21 August 2008" USE_STATS"

/Nicklas

comment:20 by mcayland, 15 years ago

Okay - I stand corrected. I was sure I had this working before, but it does crash for me when running 1.4 release :( I shall have a look at this later…

ATB,

Mark.

comment:21 by robe, 15 years ago

This is still broken

SELECT ST_MakeLine(ARRAY[NULL,NULL,NULL,NULL]);

I'm checking to see why I'm not picking up this particular one. could be because I'm not testing anything other than ST_MakeLine(ARRAY[NULL])

comment:22 by robe, 15 years ago

Yap this particular one only exhibits itself if you have more than one NULL in the table. Now that I have added a multi null table to garden of geometries. It crashes with this. Still broken in trunk.

psql:torturetest150_subset.sql:46749: ERROR: function 68F0C770 returned NULL psql:torturetest150_subset.sql:46760: NOTICE: No points in input array psql:torturetest150_subset.sql:46760: ERROR: function 68F0C770 returned NULL psql:torturetest150_subset.sql:46771: server closed the connection unexpectedly

The final killer is.

	SELECT ST_MakeLine(foo1.the_geom)
			  
			FROM ((SELECT CAST(Null As geometry) As the_geom FROM generate_series(1,4) As foo)) As foo1
			LIMIT 3;

comment:23 by mcayland, 15 years ago

Milestone: postgis 1.3.7postgis 1.4.1
Version: 1.3.X1.4.X

Not anymore as of r4618 trunk and r4619 for 1.4 branch ;)

It was another non-NULL safe array iterator in the PostGIS code - perhaps we need another acronym for this (YANNSAI)?

I guess we got lucky before in that another change altered the memory alignment temporarily so that going off the end of the array didn't break things, and so the problem wasn't really fixed before. The applied patch should resolve this for good though - please test and feed back.

ATB,

Mark.

comment:24 by robe, 15 years ago

Resolution: fixed
Status: reopenedclosed

Seems okay on my 8.4 (1.4, 1.5)

Note: See TracTickets for help on using tickets.