Ticket #1066 (closed defect: fixed)

Opened 2 years ago

Last modified 17 months ago

[raster] raster crashes server with arbitrary tests

Reported by: robe Owned by: dustymugs
Priority: blocker Milestone: PostGIS 2.0.0
Component: raster Version: trunk
Keywords: Cc:

Description

I finally got around to dustying off my raster garden tests, and was able to consistently crash my postgres server process with this sequence of tests. It doesn't really seem to be one test in particular since each one run individually doesn't crash.

But if I run them all. As Paul would say, "BOOM! - This should never happen" :)

Tests attached.

Attachments

crash_raster.sql Download (5.3 KB) - added by robe 2 years ago.
raster_garden20.zip Download (104.7 KB) - added by robe 18 months ago.

Change History

Changed 2 years ago by robe

Changed 2 years ago by mcayland

Has anyone tried running the complete set of raster tests on a copy of PostgreSQL configured with --enable-cassert --enable-debug?

This adds barriers around each memory region and therefore will throw a stacktrace immediately an attempt is made to write into an unallocated location.

Changed 23 months ago by dustymugs

I just ran the attached crash_raster test on 8.4.8 with postgis r7530 and did not experience a crash. I did notice that the calls to ST_Quantile contain invalid values for the quantile parameter (value needs to be a percentage so between 0 and 1).

I'll test also on 9.0 as well as through valgrind to hunt down memory leaks.

Changed 23 months ago by robe

Bborie,

You might want to try running the test a couple of times. I had to add a couple of entries to make mine crash so your computer might just have a higher tolerance. I also was running not on a fresh restart -- it was run on a battery of tests extracted from the raster_gardentests

I suggest running the garden tests to completion for raster to see if you get a crash. You might also have fixed the issue.

To generate the tests, do

make garden

That command will generate raster_garden20.sql, postgis_garden20.sql

Then just run the raster_garden20.sql on a new postgis database. The tests generate a log table in the database so if it crashes midway, the last test is the last one.

You can then play back the sql statements or output them from the log table. That was just a portion I outputted.

General tips on how to play back the logs:

http://trac.osgeo.org/postgis/wiki/DevWikiGardenTest

Yah the garden tests aren't supposed to be that bright. They are supposed to simulate a user with sticky fingers, speed reading the manual and just stuffing stuff that can be stuffed into the functions based on the types the manual says are allowed. I'll make some of them smarter with real valid expressions later, but haven't gotten to that with raster yet.

Changed 23 months ago by dustymugs

Can you test r7597? I've run the raster garden tests and experience no crashing in a 32-bit PostgreSQL 9.0.4. I'll be testing 8.4.8 next.

Changed 23 months ago by dustymugs

32-bit PostgreSQL 8.4.8 no longer crashes on me using the raster garden tests.

Changed 22 months ago by pracine

Can we close this one?

Changed 22 months ago by dustymugs

I'd say yes but hopefully Regina will comment.

Changed 20 months ago by robe

  • status changed from new to closed
  • resolution set to fixed

Yah it seems probably dependent on how gdal environment is configured. I think we have a lot of tickets like that so this one is probably redundant

Changed 18 months ago by robe

  • status changed from closed to reopened
  • resolution fixed deleted

I'm still able to crash the server even on my 32-bit box. I'm wondering if its just a share case of endurance because as the tests go I can feel my box getting slower and slower till I can barely log into it anymore.

Attached is the full test. Bborie, can you run the full sql tests on your dev box and see if it eventually crashes. A lot of tests won't complete because I haven't revised the script to handle things like regprocedure etc so don't worry about it.

Changed 18 months ago by robe

Changed 18 months ago by dustymugs

It doesn't crash on me with your provided garden test on my 32-bit linux box with 2GB RAM. I see a bunch of GDAL related errors though. An example would be...

ERROR:  rt_raster_gdal_warp: Unable to get GDAL suggested warp output for output dataset creation

I haven't had time to see if that error message is valid or not.

Changed 17 months ago by dustymugs

  • priority changed from critical to blocker

Changed 17 months ago by dustymugs

  • owner changed from pracine to dustymugs
  • status changed from reopened to new

Changed 17 months ago by dustymugs

  • status changed from new to assigned

Changed 17 months ago by dustymugs

robe,

Can you try your garden tests using PostgreSQL 9.1 after creating a "crashdumps" directory in your cluster data directory? I'm hoping that a dumpfile is generated. Refer to section 15.7.5.1 of the following link.

 http://www.postgresql.org/docs/9.1/interactive/installation-platform-notes.html#INSTALLATION-NOTES-MINGW

Assuming a dumpfile is generated, please attach to ticket. Assuming this works, this might be of significant help debugging Windows crashes.

Changed 17 months ago by robe

I haven't done the crash dump thing yet. That requires me to test with my mingw install which I seem to be missing some dependency file for. I assume its just some dll I forgot to copy.

Anyrate I'm getting a totally different error now and one I have never seen before.

CONTEXT:  PL/pgSQL function "st_asraster" line 26 at RETURN
psql:raster_gardentest_20.sql:16154: ERROR:  current transaction is aborted, com
mands ignored until end of transaction block
psql:raster_gardentest_20.sql:16176: lost synchronization with server: got messa
ge type "D", length 288025282

When I run the offending line in isolation -- offending query is:

{{{SELECT ST_AsRaster(foo2.the_geom, rast1.rast, '1BB', 1.5, 1.5, false), ST_AsEWKT(rast1.rast::geometry) As ref1_geom, ST_AsEWKT(foo2.the_geom) As ref2_geom

FROM (

(SELECT ST_SetSRID(ST_SetValue(ST_AddBand(ST_MakeEmptyRaster( 100, 100, (i-1)*100, (i-1)*100, 0.0005, -0.0005, 0*i, 0*i), '1BB'), i, (i+1),0),4326) As rast

FROM generate_series(1,10) As i)

) As rast1 CROSS JOIN ((SELECT ST_Buffer(ST_SetSRID(ST_Point(i,j),4326), j*0.05) As the_geom

FROM (SELECT a*1.11111111 FROM generate_series(-10,50,10) As a) As i(i)

CROSS JOIN generate_series(40,70, 20) As j ORDER BY i, i*j, j)) As foo2

LIMIT 2;

}}}

It generates this error:

ERROR:  out of memory
DETAIL:  String of 288024126 bytes is too long for encoding conversion.

Changed 17 months ago by robe

Forgot this is testing with the latest windows experimental -- r8697, gdal 9.0.0 rc2

and my sql got mangled in last post

SELECT ST_AsRaster(foo2.the_geom, rast1.rast, '1BB', 1.5, 1.5, false), ST_AsEWKT(rast1.rast::geometry) As ref1_geom, ST_AsEWKT(foo2.the_geom) As ref2_geom

    FROM (

    (SELECT ST_SetSRID(ST_SetValue(ST_AddBand(ST_MakeEmptyRaster( 100, 100, (i-1)*100, (i-1)*100, 0.0005, -0.0005, 0*i, 0*i), '1BB'), i, (i+1),0),4326) As rast

        FROM generate_series(1,10) As i)

    ) As rast1 CROSS JOIN ((SELECT ST_Buffer(ST_SetSRID(ST_Point(i,j),4326), j*0.05) As the_geom

    FROM (SELECT a*1.11111111 FROM generate_series(-10,50,10) As a) As i(i)

        CROSS JOIN generate_series(40,70, 20) As j ORDER BY i, i*j, j)) As foo2

            LIMIT 2;


Changed 17 months ago by robe

Here is a slightly shorter without all that extra fluff that still generates the same error:

SELECT ST_AsRaster(foo2.the_geom, rast1.rast, '1BB', 1.5, 1.5, false)

    FROM (

    (SELECT ST_SetSRID(ST_SetValue(ST_AddBand(ST_MakeEmptyRaster( 100, 100, (i-1)*100, (i-1)*100, 0.0005, -0.0005, 0*i, 0*i), '1BB'), i, (i+1),0),4326) As rast

        FROM generate_series(1,10) As i)

    ) As rast1 CROSS JOIN ((SELECT ST_Buffer(ST_SetSRID(ST_Point(i,j),4326), j*0.05) As the_geom

    FROM (SELECT a*1.11111111 FROM generate_series(-10,50,10) As a) As i(i)

        CROSS JOIN generate_series(40,70, 20) As j ORDER BY i, i*j, j)) As foo2

            LIMIT 1;

Let me know if you need me to reduce it down even further.

Changed 17 months ago by robe

I tested on my prior build (I forget how longer ago -- probably not more than a week) on my windows 2008 64-bit box and if I run from pgAdmin III -- I get this error:

ERROR:  out of memory
DETAIL:  Failed on request of size 536870912.

So the last bit about encoding might just be some conversion thing psql is trying to do and can probably be ignored is my guess. So its just the out of memory and failed on request of size

Changed 17 months ago by robe

Hmm this might be a false call

If I do this:

SELECT ST_Width(ST_AsRaster(foo2.the_geom, rast1.rast, '1BB', 1.5, 1.5, false))

    FROM (

    (SELECT ST_SetSRID(ST_SetValue(ST_AddBand(ST_MakeEmptyRaster( 100, 100, (i-1)*100, (i-1)*100, 0.0005, -0.0005, 0*i, 0*i), '1BB'), i, (i+1),0),4326) As rast

        FROM generate_series(1,10) As i)

    ) As rast1 CROSS JOIN ((SELECT ST_Buffer(ST_SetSRID(ST_Point(i,j),4326), j*0.05) As the_geom

    FROM (SELECT a*1.11111111 FROM generate_series(-10,50,10) As a) As i(i)

        CROSS JOIN generate_series(40,70, 20) As j ORDER BY i, i*j, j)) As foo2

            LIMIT 1;

I don't get an error so not sure if anything is wrong and it returns 12001 for width. I might just have to change my tests as they may be being affected because they are trying to output the raster and some rasters it's generating are huge.

Changed 17 months ago by robe

  • status changed from assigned to closed
  • resolution set to fixed

I'm closing this out. The tests are up to ST_Transform and hasn't crashed yet but the testing of ST_Transform is taking an exceedingly long time and then errors out with a:

ERROR:  rt_raster_gdal_warp: Unable to get GDAL suggested warp output for output
 dataset creation

When it comes across a test like:

SELECT ST_AsEWKT(ST_ConvexHull(ST_Transform(rast1.rast, 3395,
 1.5, 1.5, 'Lanczos', 1.5))) FROM (                     (SELECT ST_SetSRID(ST_Se
tValue(ST_AddBand(ST_MakeEmptyRaster( 100, 100, (i-1)*100, (i-1)*100, 0.0005, -0
.0005, 0*i, 0*i), '2BUI'), i, (i+1),1),4326) As rast FROM generate_series(1,10)
As i) ) As rast1 LIMIT 3;

That may be expected though when you fed pseudo garbage into that function.

I think the weird error I was getting: lost synchronization with server: got messa ge type "D", length 288025282

Was because my db was set to log and I was outputting the rasters some of which were huge, and the logging probably couldn't keep up. I've changed the script to just output the convex hull if the output type is a raster or geometry.

Changed 17 months ago by dustymugs

I am getting the same answer on my Linux 64-bit dev box. I'll see if I can isolate the raster that is causing it. It may be completely valid for all we know but I'd like to double-check.

Note: See TracTickets for help on using tickets.