Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#1066 closed defect (fixed)

[raster] raster crashes server with arbitrary tests

Reported by: robe Owned by: Bborie Park
Priority: blocker Milestone: PostGIS 2.0.0
Component: raster Version: master
Keywords: Cc:

Description

I finally got around to dustying off my raster garden tests, and was able to consistently crash my postgres server process with this sequence of tests. It doesn't really seem to be one test in particular since each one run individually doesn't crash.

But if I run them all. As Paul would say, "BOOM! - This should never happen" :)

Tests attached.

Attachments (2)

crash_raster.sql (5.3 KB) - added by robe 10 years ago.
raster_garden20.zip (104.7 KB) - added by robe 9 years ago.

Download all attachments as: .zip

Change History (23)

Changed 10 years ago by robe

Attachment: crash_raster.sql added

comment:1 Changed 10 years ago by mcayland

Has anyone tried running the complete set of raster tests on a copy of PostgreSQL configured with --enable-cassert --enable-debug?

This adds barriers around each memory region and therefore will throw a stacktrace immediately an attempt is made to write into an unallocated location.

comment:2 Changed 10 years ago by Bborie Park

I just ran the attached crash_raster test on 8.4.8 with postgis r7530 and did not experience a crash. I did notice that the calls to ST_Quantile contain invalid values for the quantile parameter (value needs to be a percentage so between 0 and 1).

I'll test also on 9.0 as well as through valgrind to hunt down memory leaks.

comment:3 Changed 10 years ago by robe

Bborie,

You might want to try running the test a couple of times. I had to add a couple of entries to make mine crash so your computer might just have a higher tolerance. I also was running not on a fresh restart -- it was run on a battery of tests extracted from the raster_gardentests

I suggest running the garden tests to completion for raster to see if you get a crash. You might also have fixed the issue.

To generate the tests, do

make garden

That command will generate raster_garden20.sql, postgis_garden20.sql

Then just run the raster_garden20.sql on a new postgis database. The tests generate a log table in the database so if it crashes midway, the last test is the last one.

You can then play back the sql statements or output them from the log table. That was just a portion I outputted.

General tips on how to play back the logs:

http://trac.osgeo.org/postgis/wiki/DevWikiGardenTest

Yah the garden tests aren't supposed to be that bright. They are supposed to simulate a user with sticky fingers, speed reading the manual and just stuffing stuff that can be stuffed into the functions based on the types the manual says are allowed. I'll make some of them smarter with real valid expressions later, but haven't gotten to that with raster yet.

comment:4 Changed 10 years ago by Bborie Park

Can you test r7597? I've run the raster garden tests and experience no crashing in a 32-bit PostgreSQL 9.0.4. I'll be testing 8.4.8 next.

comment:5 Changed 10 years ago by Bborie Park

32-bit PostgreSQL 8.4.8 no longer crashes on me using the raster garden tests.

comment:6 Changed 10 years ago by pracine

Can we close this one?

comment:7 Changed 10 years ago by Bborie Park

I'd say yes but hopefully Regina will comment.

comment:8 Changed 9 years ago by robe

Resolution: fixed
Status: newclosed

Yah it seems probably dependent on how gdal environment is configured. I think we have a lot of tickets like that so this one is probably redundant

comment:9 Changed 9 years ago by robe

Resolution: fixed
Status: closedreopened

I'm still able to crash the server even on my 32-bit box. I'm wondering if its just a share case of endurance because as the tests go I can feel my box getting slower and slower till I can barely log into it anymore.

Attached is the full test. Bborie, can you run the full sql tests on your dev box and see if it eventually crashes. A lot of tests won't complete because I haven't revised the script to handle things like regprocedure etc so don't worry about it.

Changed 9 years ago by robe

Attachment: raster_garden20.zip added

comment:10 Changed 9 years ago by Bborie Park

It doesn't crash on me with your provided garden test on my 32-bit linux box with 2GB RAM. I see a bunch of GDAL related errors though. An example would be...

ERROR:  rt_raster_gdal_warp: Unable to get GDAL suggested warp output for output dataset creation

I haven't had time to see if that error message is valid or not.

comment:11 Changed 9 years ago by Bborie Park

Priority: criticalblocker

comment:12 Changed 9 years ago by Bborie Park

Owner: changed from pracine to Bborie Park
Status: reopenednew

comment:13 Changed 9 years ago by Bborie Park

Status: newassigned

comment:14 Changed 9 years ago by Bborie Park

robe,

Can you try your garden tests using PostgreSQL 9.1 after creating a "crashdumps" directory in your cluster data directory? I'm hoping that a dumpfile is generated. Refer to section 15.7.5.1 of the following link.

http://www.postgresql.org/docs/9.1/interactive/installation-platform-notes.html#INSTALLATION-NOTES-MINGW

Assuming a dumpfile is generated, please attach to ticket. Assuming this works, this might be of significant help debugging Windows crashes.

comment:15 Changed 9 years ago by robe

I haven't done the crash dump thing yet. That requires me to test with my mingw install which I seem to be missing some dependency file for. I assume its just some dll I forgot to copy.

Anyrate I'm getting a totally different error now and one I have never seen before.

CONTEXT:  PL/pgSQL function "st_asraster" line 26 at RETURN
psql:raster_gardentest_20.sql:16154: ERROR:  current transaction is aborted, com
mands ignored until end of transaction block
psql:raster_gardentest_20.sql:16176: lost synchronization with server: got messa
ge type "D", length 288025282

When I run the offending line in isolation -- offending query is:

{{{SELECT ST_AsRaster(foo2.the_geom, rast1.rast, '1BB', 1.5, 1.5, false), ST_AsEWKT(rast1.rast::geometry) As ref1_geom, ST_AsEWKT(foo2.the_geom) As ref2_geom

FROM (

(SELECT ST_SetSRID(ST_SetValue(ST_AddBand(ST_MakeEmptyRaster( 100, 100, (i-1)*100, (i-1)*100, 0.0005, -0.0005, 0*i, 0*i), '1BB'), i, (i+1),0),4326) As rast

FROM generate_series(1,10) As i)

) As rast1 CROSS JOIN ((SELECT ST_Buffer(ST_SetSRID(ST_Point(i,j),4326), j*0.05) As the_geom

FROM (SELECT a*1.11111111 FROM generate_series(-10,50,10) As a) As i(i)

CROSS JOIN generate_series(40,70, 20) As j ORDER BY i, i*j, j)) As foo2

LIMIT 2;

}}}

It generates this error:

ERROR:  out of memory
DETAIL:  String of 288024126 bytes is too long for encoding conversion.

comment:16 Changed 9 years ago by robe

Forgot this is testing with the latest windows experimental -- r8697, gdal 9.0.0 rc2

and my sql got mangled in last post

SELECT ST_AsRaster(foo2.the_geom, rast1.rast, '1BB', 1.5, 1.5, false), ST_AsEWKT(rast1.rast::geometry) As ref1_geom, ST_AsEWKT(foo2.the_geom) As ref2_geom

    FROM (

    (SELECT ST_SetSRID(ST_SetValue(ST_AddBand(ST_MakeEmptyRaster( 100, 100, (i-1)*100, (i-1)*100, 0.0005, -0.0005, 0*i, 0*i), '1BB'), i, (i+1),0),4326) As rast

        FROM generate_series(1,10) As i)

    ) As rast1 CROSS JOIN ((SELECT ST_Buffer(ST_SetSRID(ST_Point(i,j),4326), j*0.05) As the_geom

    FROM (SELECT a*1.11111111 FROM generate_series(-10,50,10) As a) As i(i)

        CROSS JOIN generate_series(40,70, 20) As j ORDER BY i, i*j, j)) As foo2

            LIMIT 2;


comment:17 Changed 9 years ago by robe

Here is a slightly shorter without all that extra fluff that still generates the same error:

SELECT ST_AsRaster(foo2.the_geom, rast1.rast, '1BB', 1.5, 1.5, false)

    FROM (

    (SELECT ST_SetSRID(ST_SetValue(ST_AddBand(ST_MakeEmptyRaster( 100, 100, (i-1)*100, (i-1)*100, 0.0005, -0.0005, 0*i, 0*i), '1BB'), i, (i+1),0),4326) As rast

        FROM generate_series(1,10) As i)

    ) As rast1 CROSS JOIN ((SELECT ST_Buffer(ST_SetSRID(ST_Point(i,j),4326), j*0.05) As the_geom

    FROM (SELECT a*1.11111111 FROM generate_series(-10,50,10) As a) As i(i)

        CROSS JOIN generate_series(40,70, 20) As j ORDER BY i, i*j, j)) As foo2

            LIMIT 1;

Let me know if you need me to reduce it down even further.

comment:18 Changed 9 years ago by robe

I tested on my prior build (I forget how longer ago -- probably not more than a week) on my windows 2008 64-bit box and if I run from pgAdmin III -- I get this error:

ERROR:  out of memory
DETAIL:  Failed on request of size 536870912.

So the last bit about encoding might just be some conversion thing psql is trying to do and can probably be ignored is my guess. So its just the out of memory and failed on request of size

comment:19 Changed 9 years ago by robe

Hmm this might be a false call

If I do this:

SELECT ST_Width(ST_AsRaster(foo2.the_geom, rast1.rast, '1BB', 1.5, 1.5, false))

    FROM (

    (SELECT ST_SetSRID(ST_SetValue(ST_AddBand(ST_MakeEmptyRaster( 100, 100, (i-1)*100, (i-1)*100, 0.0005, -0.0005, 0*i, 0*i), '1BB'), i, (i+1),0),4326) As rast

        FROM generate_series(1,10) As i)

    ) As rast1 CROSS JOIN ((SELECT ST_Buffer(ST_SetSRID(ST_Point(i,j),4326), j*0.05) As the_geom

    FROM (SELECT a*1.11111111 FROM generate_series(-10,50,10) As a) As i(i)

        CROSS JOIN generate_series(40,70, 20) As j ORDER BY i, i*j, j)) As foo2

            LIMIT 1;

I don't get an error so not sure if anything is wrong and it returns 12001 for width. I might just have to change my tests as they may be being affected because they are trying to output the raster and some rasters it's generating are huge.

comment:20 Changed 9 years ago by robe

Resolution: fixed
Status: assignedclosed

I'm closing this out. The tests are up to ST_Transform and hasn't crashed yet but the testing of ST_Transform is taking an exceedingly long time and then errors out with a:

ERROR:  rt_raster_gdal_warp: Unable to get GDAL suggested warp output for output
 dataset creation

When it comes across a test like:

SELECT ST_AsEWKT(ST_ConvexHull(ST_Transform(rast1.rast, 3395,
 1.5, 1.5, 'Lanczos', 1.5))) FROM (                     (SELECT ST_SetSRID(ST_Se
tValue(ST_AddBand(ST_MakeEmptyRaster( 100, 100, (i-1)*100, (i-1)*100, 0.0005, -0
.0005, 0*i, 0*i), '2BUI'), i, (i+1),1),4326) As rast FROM generate_series(1,10)
As i) ) As rast1 LIMIT 3;

That may be expected though when you fed pseudo garbage into that function.

I think the weird error I was getting: lost synchronization with server: got messa ge type "D", length 288025282

Was because my db was set to log and I was outputting the rasters some of which were huge, and the logging probably couldn't keep up. I've changed the script to just output the convex hull if the output type is a raster or geometry.

comment:21 Changed 9 years ago by Bborie Park

I am getting the same answer on my Linux 64-bit dev box. I'll see if I can isolate the raster that is causing it. It may be completely valid for all we know but I'd like to double-check.

Note: See TracTickets for help on using tickets.