Opened 12 years ago

Closed 11 years ago

Last modified 10 years ago

#1802 closed enhancement (fixed)

Ensure interruptability of long-running calls

Reported by: strk Owned by: strk
Priority: medium Milestone: PostGIS 2.1.0
Component: postgis Version: 2.0.x
Keywords: history Cc: zenitram

Description

Some functions take a lot of time or memory, but aren't interruptible. See #1799 for an example of function within postgis, but most notably GEOS implemented functions have this problem.

This ticket is to find a way to ensure interruptability of the functions.

So far it looks that calls to elog (thus lwnotice, for example) is one such chance of interruptability.

Attachments (1)

interrupt.sql (220 bytes ) - added by strk 12 years ago.
a test script

Download all attachments as: .zip

Change History (36)

comment:1 by strk, 12 years ago

I verified that palloc, repalloc and pfree do _not_ give an interruption chance. But we could, in our wrappers, by explicitly calling the CHECK_FOR_INTERRUPTS macro.

Effect of such macro on performance should be reviewed.

Also, for GEOS to take advantage of interruptability we'd need to use the custom allocators from within GEOS itself.

comment:2 by strk, 12 years ago

r9715 in trunk adds the interrupt check into custom allocators. next would be giving GEOS a way to use the custom allocators…

comment:3 by strk, 12 years ago

Also, there may also be the case of long loops within liblwgeom not doing any allocation in their body. Such loops would not be interruptible.

comment:4 by strk, 12 years ago

Enhancement ticket filed on the GEOS side: http://trac.osgeo.org/geos/ticket/540

comment:5 by mcayland, 12 years ago

What was the resulting effect on performance by adding CHECK_FOR_INTERRUPTS to the allocators? Memory allocation is already known to be a PostgreSQL performance bottleneck, so it strikes me that this could have a significant effect on queries with large datasets/long query times. Can you provide the before and after numbers for a test query on a reasonably-sized dataset (several million records)?

comment:6 by strk, 12 years ago

I didn't realize a formal test. Can you think of one ? Performance will be highly dependent on the number of allocation calls, so not every function is the same. Maybe string-output functions ? I do have a synthesized dataset of ~80 milion points once we figure the effect.

comment:7 by strk, 12 years ago

In any case, it'd be interesting to expose a "breath" function in the liblwgeom API to avoid this coupling with allocation functions. The same would need to be done for GEOS, if we go that way (a "check_for_interrupts" callback).

comment:8 by strk, 12 years ago

In any case, the macro is just a variable lookup:

#define CHECK_FOR_INTERRUPTS() \
do { \
        if (InterruptPending) \
                ProcessInterrupts(); \
} while(0)

comment:9 by robe, 12 years ago

Owner: changed from pramsey to strk

comment:10 by mcayland, 12 years ago

Well on the plus side, hopefully the check for InterruptPending should be reasonably lightweight. I'd just go for an index && join with ST_Contains() returning around 100,000 points and see what results you get before and after.

comment:11 by strk, 12 years ago

With r9824 interruptibility of GEOS implemented calls is also provided. Requires GEOS-3.4.0, and lots of testing…

comment:12 by strk, 12 years ago

So I think it is safe to have this in 2.0.1. Please scream if you have problems with it.

comment:13 by mcayland, 12 years ago

-1 for 2.0.1. This is a new feature and not a bug-fix, therefore it needs to go in trunk.

comment:14 by strk, 12 years ago

Tecnhically it is not a feature. I mean… all queries can be interrupted, except the ones implemented in GEOS. Interruptibility of the ST_Segmentize call was committed in 2.0 branch, for example. There's no API change (in PostGIS).

comment:15 by jatorre, 12 years ago

Actually to me is a critical bug. Without PostGIS honoring PostgreSQL time outs, the database become quite sensitive. In some of our tests actually we could only take back PostgreSQL after a full restart when a nasty non interruptable function gets to spin. Somehow even PostgreSQL should even fire a warning alerting that a function is not behaving taken more time that what the system has give to him.

So, I vote to have it as 2.0.1 and ensure that in future functions we can warranty that our functions are interruptable.

comment:16 by strk, 12 years ago

Mark: maybe you don't feel the problem as you never had it. Try running this query:

set statement_timeout to 1000;
select st_buffer(st_collect(geom), 1e5) from ( select (st_dumppoints(st_buffer(st_point(0,0), 1e4, 150))).geom ) as foo; 

And see if it stops, as requested, after one second. Oh, you might need to hard-kill the backend to take back control of your machine, so be prepared…

comment:17 by mcayland, 12 years ago

Oh I'm not doubting that the problem exists, but my personal take is this:

1) The problem has existed as long as PostGIS has used GEOS. The mailing list isn't flooded with hundreds of emails complaining about being unable to interrupt queries so it's hardly a pressing problem that needs to be solved right now. I'd much rather have a shorter 2.1 cycle (3-4 months) and put out something that's been through rigorous beta testing to squeeze out any potential memory corruption bugs.

jatorre: in order to implement this, strk has had to add code to core parts of both PostGIS and GEOS. It's not just correcting a logic bug or similar, it really is a new feature. New code generally means new bugs.

2) In your own words from comment 11: "Requires GEOS-3.4.0, and lots of testing…". The only way to test it properly will be to get people using it for real, and 2.0 branch is a stable branch, not the place for testing new code.

3) The PSC agreed that point releases would be only be bugfix releases (mainly at the request of the PostgreSQL developers who were having problems with tracking bugs and API changes within point releases)

4) This is contrary to your own position earlier within PostGIS development. At your request, I did not commit my initial implementation of the PROJ.4 cache to the latest 1.0 point release but to the 1.1 branch (see http://postgis.refractions.net/pipermail/postgis-devel/2005-November/001645.html), as you were concerned it would affect the stability of the 1.0 series despite the fact it was a live issue for us and we had been running the patches on live servers for several weeks. To make yourself an exception to the same rule is unfair upon all the other developers.

comment:18 by strk, 12 years ago

If I read the message from 2005 correctly your change removed an SQL entry point (transform_geom). SQL is our API so it was really a problem of backward compatibility. There's no backward compatibility problem in this case. The SQL interface doesn't change. There is _no_ API change.

The only point I see as valid is the testing one. What about putting the code in but conditionally enable it with a compile time define ? Would give people the opportunity to test within the framework of the 2.0 branch w/out them being forced to do so (it will be off by deafult).

PS: I agree about shorter release cycles

comment:19 by strk, 12 years ago

So in r9850 we do have the signal handling code protected by a POSTGIS_ENABLE_GEOS_INTERRUPTIBILITY macro, undefined by default.

If you are not building against GEOS-3.4.0+ you can't define the macro (it's undefined later in this case).

This should make it very easy for anyone to test.

comment:20 by mcayland, 12 years ago

The part about the transform_geom() function was an aside in the last paragraph of that message - it was not part of the patch at that time, and does nothing to detract from your clear statement that the 1.0 branch was for bugfixes only.

The aim of a stable branch is to introduce only the *minimal* changes required in order to produce correct functionality before the new point release - anything else is a feature. If we do this, where do we stop? Paul has some ideas about internally indexing large geometries - do we put that in 2.0.1? What about some of Bborie's changes, where he waited until we branched before committing them as per our normal development process? Should we backport those too?

And now I see that rather than wait to discuss this, you've just gone ahead and committed anyway. I repeat what I said before - this is not fair on the other developers. I'm sorry, but it comes across that you've committed the code to the stable branch to meet either a personal/business requirement in violation of our normal guidelines, and this is completely unacceptable.

+1 for reverting for this commit.

comment:21 by strk, 12 years ago

Mark, the code is _disabled_ ! Probably better to move this discussion to the list…

by strk, 12 years ago

Attachment: interrupt.sql added

a test script

comment:22 by strk, 12 years ago

The attached interrupt.sql script may be used for testing purposes. Run with: $ time psql -f interrupt.sql It is expected that the query ends with something like:

ERROR:  GEOSBuffer: InterruptedException: Interrupted!

and it takes no much longer than 1 second to finish.

I tested with with PostgreSQL 8.4.10 and 9.1.2, GEOS-3.4.0dev r3663, postgis 2.1.0svn r9851 and postgis 2.0.1svn r9849 with the macro defined.

postgis 2.0.1svn r9849 without the macro defined completes (interrupting) in about 43 seconds instead.

comment:23 by strk, 12 years ago

Milestone: PostGIS 2.0.1PostGIS 2.1.0

pushing milestone back to 2.1.0…

comment:24 by strk, 12 years ago

So it turns out windows builds don't get the SIGINT delivered immediately, defeating the purpose of the handler. Indeed the PostgreSQL CHECK_FOR_INTERRUPTS call is more complex for windows, in that it explicitly flushes all non-ignored pending signals as well.

We could replicate the complexity within the GEOS equivalent of the CHECK_FOR_INTERRUPTS function, or allow GEOS clients to pass a callback to invoke right before interrupt request checking to allow clients to do any flushing they might need.

The second is likely to be more expensive as it'd mean a function call per check, rather than a simple variable lookup. Even worst it'd be a cross-library function call. Nothing dramatic, and only affecting windows, but worth mentioning.

comment:25 by strk, 12 years ago

Alright, more changes, more testing needed …

Robe: could you please try building r9857 of PostGIS after installing r 3672 of GEOS ? Better if you do with WIN32 as the change is using a WIN32 macro (as found in postgresql source code) which may or may not be defined with win64.

Since we check for GEOS version to enable or not interruption anyone with a non up-to-date GEOS-3.4SVN wont be able to build for windows.

comment:26 by robe, 12 years ago

strk,

Not having much luck compiling. After compiling geos r3672 and installing that, I still can't compile PostGIS under win32.

I get this error:

postgis_module.c: In function `_PG_init':
postgis_module.c:54: error: too few arguments to function `GEOS_interruptRegisterCallback'
postgis_module.c: At top level:
postgis_module.c:35: warning: 'geosInterruptCallback' defined but not used
make[1]: *** [postgis_module.o] Error 1
make[1]: Leaving directory `/c/projects/PostGIS/trunk/postgis'
make: *** [all] Error 1

BTW my GEOS make install error looks like this and then I have to manually copy the geos-config to complete install.

I'm pretty sure its this ticket issue: http://trac.osgeo.org/geos/ticket/472 which is caused by this thing you did: http://trac.osgeo.org/geos/ticket/319 I think you said you were going to revert?

XMLTester-XMLTester.o: In function `ZN9XMLTester19parsePrecisionModelEPK12TiXmlE                                                         lement':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:334:                                                          undefined reference to `geos::geom::PrecisionModel::~PrecisionModel()'
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:334:                                                          undefined reference to `geos::geom::PrecisionModel::~PrecisionModel()'
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:334:                                                          undefined reference to `geos::geom::PrecisionModel::~PrecisionModel()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterD1Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/ext/new_allo                                                         cator.h:69: undefined reference to `geos::io::ByteOrderDataInStream::~ByteOrderD                                                         ataInStream()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterD1Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::io::WKTReader::~WKTReader()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterD1Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/ext/new_allo                                                         cator.h:69: undefined reference to `geos::io::ByteOrderDataInStream::~ByteOrderD                                                         ataInStream()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterD1Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::geom::PrecisionModel::~PrecisionModel()'
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::io::WKTReader::~WKTReader()'
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::geom::PrecisionModel::~PrecisionModel()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterD2Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/ext/new_allo                                                         cator.h:69: undefined reference to `geos::io::ByteOrderDataInStream::~ByteOrderD                                                         ataInStream()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterD2Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::io::WKTReader::~WKTReader()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterD2Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/ext/new_allo                                                         cator.h:69: undefined reference to `geos::io::ByteOrderDataInStream::~ByteOrderD                                                         ataInStream()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterD2Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::geom::PrecisionModel::~PrecisionModel()'
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::io::WKTReader::~WKTReader()'
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::geom::PrecisionModel::~PrecisionModel()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterC1Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/ext/new_allo                                                         cator.h:69: undefined reference to `geos::io::ByteOrderDataInStream::~ByteOrderD                                                         ataInStream()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterC1Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::io::WKTReader::~WKTReader()'
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::geom::PrecisionModel::~PrecisionModel()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterC2Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/ext/new_allo                                                         cator.h:69: undefined reference to `geos::io::ByteOrderDataInStream::~ByteOrderD                                                         ataInStream()'
XMLTester-XMLTester.o: In function `ZN9XMLTesterC2Ev':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::io::WInfo: resolving vtable for geos::geomgraph::E                                                         dgeListby linking to __imp___ZTVN4geos9geomgraph8EdgeListE (auto-import)
KTReader::~WKTReader()'
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:260:                                                          undefined reference to `geos::geom::PrecisionModel::~PrecisionModel()'
XMLTester-XMLTester.o: In function `ZN9XMLTester9parseTestEPK9TiXmlNode':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/limits:1058:                                                          undefined reference to `geos::geom::Coordinate::Coordinate(double, double, doub                                                         le)'
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/limits:1058:                                                          undefined reference to `geos::geom::Coordinate::Coordinate(double, double, doub                                                         le)'
XMLTester-XMLTester.o: In function `ZN9XMLTester8parseRunEPK9TiXmlNode':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:301:                                                          undefined reference to `geos::io::WKTReader::WKTReader(geos::geom::GeometryFacto                                                         ry const*)'
XMLTester-XMLTester.o: In function `ZN9XMLTester8parseRunEPK9TiXmlNode':
c:/projects/geos/branches/3.4/tests/xmltester/../../include/geos/io/WKBReader.h:                                                         83: undefined reference to `geos::io::ByteOrderDataInStream::ByteOrderDataInStre                                                         am(std::istream*)'
XMLTester-XMLTester.o: In function `ZN9XMLTester8parseRunEPK9TiXmlNode':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/ext/new_allo                                                         cator.h:69: undefined reference to `geos::io::ByteOrderDataInStream::~ByteOrderD                                                         ataInStream()'
XMLTester-XMLTester.o: In function `ZN9XMLTester8parseRunEPK9TiXmlNode':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:334:                                                          undefined reference to `geos::geom::PrecisionModel::~PrecisionModel()'
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/memory:334:                                                          undefined reference to `geos::io::WKTReader::~WKTReader()'
XMLTester-BufferResultMatcher.o: In function `ZN4geos9xmltester19BufferResultMat                                                         cher38isBoundaryHausdorffDistanceInToleranceERKNS_4geom8GeometryES5_d':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/limits:1058:                                                          undefined reference to `geos::geom::Coordinate::Coordinate(double, double, doub                                                         le)'
XMLTester-BufferResultMatcher.o: In function `ZN4geos9xmltester19BufferResultMat                                                         cher24isSymDiffAreaInToleranceERKNS_4geom8GeometryES5_':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/bits/ostream                                                         .tcc:63: undefined reference to `geos::geom::Coordinate::Coordinate(double, doub                                                         le, double)'
XMLTester-SingleSidedBufferResultMatcher.o: In function `ZN4geos9xmltester30Sing                                                         leSidedBufferResultMatcher38isBoundaryHausdorffDistanceInToleranceERKNS_4geom8Ge                                                         ometryES5_d':
C:/MinGW/bin/../lib/gcc/mingw32/3.4.5/../../../../include/c++/3.4.5/limits:1058:                                                          undefined reference to `geos::geom::Coordinate::Coordinate(double, double, doub                                                         le)'

in reply to:  26 comment:27 by strk, 12 years ago

Replying to robe:

postgis_module.c:54: error: too few arguments to function `GEOS_interruptRegisterCallback'

You must have the old geos_c.h installed. The signature of that function changed. Remember to also copy the header, not just the library.

BTW my GEOS make install error looks like this

Please let's manage GEOS issues on its own tracker.

comment:28 by strk, 12 years ago

FYI: r9874 reverts the support in 2.0 branch.

Still interested in tests in trunk before closing this out.

comment:29 by strk, 12 years ago

Oops, the build error was a real one. Robe: please try r9902. This is the problem of having different code for different systems.. oh well.

Maybe I could really change this again to _always_ call a function and use the function return code as a sign of whether or not to interrupt. Slower but would be about the same for both windows and unix. Haven't heard from Mac testers, btw.

comment:30 by robe, 11 years ago

Do we consider this done. I don't recall how to test this. If it was just an issue on windows, i say close out.

comment:31 by strk, 11 years ago

I belive this is open as it's missing testcases. A testcase could do something like setting a very short statement timeout and invoking a very expensive GEOS query. It'd be a pretty fuzzy test in that it's success would depend by the timeout being long enough not to kick before GEOS starts and short enough not to kick after GEOS ends, and it will be also dependent on GEOS, and on the machine. Tricky one.

Maybe the test should be independent from the others and first run the expensive query once to time it, then set timeout to half of that and run again, taking in consideration also cold vs. warm runs…

Did anyone get excited enough to work on it ? :)

comment:32 by robe, 11 years ago

Not me. I don't have patience for queries that take longer than 20 seconds. At that point I start rewriting my queries or convincing whoever it was who asked it was a stupid question to pose :)

Can we push this unless you want are excited to write such a test.

comment:33 by robe, 11 years ago

Keywords: history added
Resolution: fixed
Status: newclosed
Type: taskenhancement

Actually I have a better idea. Close this one call it done. Open another ticket in future if you really want to test this or just wait for the first person to complain it doesn't work and use their test. Its too time consuming of a test to put in our standard regression testing.

comment:34 by zenitram, 10 years ago

Cc: zenitram added

I'm experiencing uninterruptability queries on the following version, compiled against GEOS 3.4:

POSTGIS="2.1.3 r12547" GEOS="3.4.2-CAPI-1.8.2 r3921" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.10.1, released 2013/08/26" LIBXML="2.7.8" LIBJSON="UNKNOWN" RASTER

with the following query:

SELECT ST_Segmentize(ST_MakeLine(ST_MakePoint(4,39), ST_MakePoint(1,41)), 1e-100);

It doesn't get killed by the statement_timeout or manual cancellation requests.

comment:35 by strk, 10 years ago

ST_Segmentize is not implemented by GEOS but by PostGIS itself. Probably liblwgeom. Could you please file a separate ticket specific to ST_Segmentize ? It might take interruptability support in liblwgeom …

Note: See TracTickets for help on using tickets.