Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#1697 closed defect (fixed)

Crash in GiST index with empty MultiPolygons

Reported by: realityexists Owned by: pramsey
Priority: critical Milestone: PostGIS 2.0.0
Component: postgis Version: trunk
Keywords: Cc:

Description

2.0.0 rev 9517, 32-bit, PostgreSQL 9.1.3

I have a table with a geography(MultiPolygon?) column and a GiST index on that column. For each new row my application initially inserts 'MULTIPOLYGON EMPTY' into that column, then updates it to the real (non-empty) value. After some number of successful inserts any further inserts crash with the attached stack trace. If I insert a different dummy polygon (not empty) everything works fine.

I wasn't able to create a simple repro for this, so I suspect it takes some amount of data for the index to get into a bad state. The stack trace is from Linux, but this also happens on Windows (beta 3 binaries).

The client gets messages like this when trying to insert another value (once a crash has occurred):

NOTICE:  [geography_inout.c:gserialized_geography_from_lwgeom:117] typmod was -1
LINE 22: ...(E'SA')), ((E'FL')), ((245)), ((E'MULTIPOL...
                                                              ^
NOTICE:  [gserialized_typmod.c:postgis_valid_typmod:115] Entered function
NOTICE:  [gserialized_typmod.c:postgis_valid_typmod:120] Got geom(type = 6, srid = 4326, hasz = 0, hasm = 0)
NOTICE:  [gserialized_typmod.c:postgis_valid_typmod:121] Got typmod(type = 6, srid = 4326, hasz = 0, hasm = 0)
NOTICE:  [gserialized_gist_nd.c:gserialized_gist_compress:488] [GIST] 'compress' function called
NOTICE:  [gserialized_gist_nd.c:gserialized_gist_compress:500] [GIST] processing leafkey input
NOTICE:  [gserialized_gist.c:gserialized_datum_get_gidx_p:241] entered function
NOTICE:  [gserialized_gist.c:gserialized_datum_get_gidx_p:250] got flags 8
NOTICE:  [gserialized_gist.c:gserialized_datum_get_gidx_p:270] could not calculate bbox, returning failure
NOTICE:  [gserialized_gist_nd.c:gserialized_gist_compress:522] [GIST] empty geometry!
NOTICE:  [gserialized_gist_nd.c:gserialized_gist_picksplit:1016] [GIST] 'picksplit' function called
********** Error **********

Attachments (3)

Empty MultiPolygon crash stack.txt (3.7 KB) - added by realityexists 7 years ago.
gidx.patch (7.0 KB) - added by pramsey 7 years ago.
Fixes crash. Any behaviour regressions?
GiST insert crash.txt (3.2 KB) - added by realityexists 7 years ago.
Crash in r9575 - stack trace

Download all attachments as: .zip

Change History (28)

Changed 7 years ago by realityexists

comment:1 Changed 7 years ago by pramsey

You're dying in picksplit, so the number of inserts necessary is related to having to get the the node size up to the maximum before split is forced. You should be able to generate it repeatedly with a workload... could you attach the workload or post it online? You allude to it, but a more precise description would be useful. Thanks.

comment:2 Changed 7 years ago by pramsey

This didn't generate any problem, even run 1000s of times:

insert into test (g) values ('MULTIPOLYGON EMPTY');
update test set g = 'MULTIPOLYGON(((0 0, 0 1, 1 1, 1 0, 0 0)))' where i = currval('test_i_seq');

comment:3 Changed 7 years ago by pramsey

OK, this is fully replicable, just create a table with a geography column and insert empty geometries until it crashes when it gets to the first split.

comment:4 Changed 7 years ago by pramsey

No bug for geometry columns with a 2D index.

comment:5 Changed 7 years ago by pramsey

No bug for geometry columns with an ND index? Huh? that's the same code as geography uses...

comment:6 Changed 7 years ago by pramsey

But geography definitely crashes...

create table test (g geography);
create index gt on test using gist (g);
insert into test (g) select 'POLYGON EMPTY'::geography from generate_series(1,300);

comment:7 Changed 7 years ago by pramsey

So, several hours of debugging later, no real change. The problem happens in picksplit, it happens right at the start, as PostgreSQL hands in the user code a bogus box_current. The box contains a crazy size header which leads to a failed palloc in the gidx_copy function, and finally the failed memcpy that the stacktrace reports. None of this happens if you use geometry instead of geography but still use an nd index. Same code, same input objects, very different results.

I thought perhaps the gist compress function (which all the objects traverse as they are added to the node) might do something different between geometry and geography, but it doesn't. In both cases, compress finds no calculateable bounding box (everything is empty) so it returns back the input.

comment:8 Changed 7 years ago by pramsey

Aha, I can now make the geometry ND index crash too, by putting an SRID on the geometry.

create table test (g geometry);
create index gt on test using gist (g gist_geometry_ops_nd);
insert into test (g) select st_setsrid('POLYGON EMPTY'::geometry,4326) from generate_series(1,300);

I still don't know why the crash happens, but the geography-vs-geometry part is gone, this is a problem with the ND index and empty gserialized that have srids on them.

comment:9 Changed 7 years ago by pramsey

Problem understood.

When a key is passed into gist compress, you are expected to compress it and return a key suitable for insertion into the index. For null, the gist superstructure actually bypasses the user function, so our null handling never even gets called. For non-null, the expectation is that we'll return something sensible, that is to say, a key we will be able to handle correctly later on.

Instead, when we hit the empty case, we're just echoing back the input key datum, so it's a garbage key. Any sensible behavior we have gotten with empties so far is blind luck. Probably because we don't care about them much, and don't seem to insert them in an index often or at all.

We need to either (a) figure out how to handle them like nulls (just blindly copying the old user-function null handling (which as noted above, doesn't actually get called) over resulted in crashes, so more research there is required) or (b) actually add an "empty box" semantic and deal with how to search over empty boxes properly. For (b) one possibility might be to generate a (Inf, Inf, Inf, Inf) box, that won't intersect anything but itself (I think), however, that could also end up unbalancing trees nastily.

comment:10 Changed 7 years ago by robe

Is there a simple solution that won't screw up something bigger or can we punt this to 2.0.1 and just make a note of it in 2.0.0?

comment:11 Changed 7 years ago by pramsey

I think the "right" solution is to get the index to treat the empties as nulls, but I'm currently uncertain as to how to do that and wlll be doing opengeo and conference stuff for the next couple weeks. This problem should also exist in 1.5, so it's not a new exciting crasher, it's been around for a while and just get excavated recently, so I'd say there's no shame in pushing it forward.

comment:12 Changed 7 years ago by strk

Sounds useful: NOTICE: [gserialized_gist_nd.c:gidx_copy:89] copying gidx (0x7f8d276adfa0) of size 964952070 to gidx (0x7f8cec988048)

Aren't we returning memory allocated in a local variable, btw ???!!!

        char gidxmem[GIDX_MAX_SIZE];
        GIDX *bbox_out = (GIDX*)gidxmem;

You can't safely return bbox_out then

comment:13 Changed 7 years ago by strk

I think a non-crashing fix would be useful for 2.0. Don't understand why you can't crash the 2d index. Maybe it's because GBOX is varsize.

comment:14 Changed 7 years ago by pramsey

Regarding memory and so on, bbox_out isn't returned, a gist entry is returned, and in cases where a box is embedded in the gist entry as a key, the box is copied before returning. I carefully set this up to avoid global allocations, don't worry about that stuff. The problem is not there, it's in how we handle empties in the compress routines.

Regarding fixing, now that I know where the problem is, fixing becomes quite possible, but will take some time to try different things and test. As noted before, this problem has existed in 1.5 for a long time in the geography type, so I don't think we should necessarily block on it unless we really really want to.

What surprises me is when I try to return an empty gist entry, aping the null handling, I get a crash, so I must be doing something wrong in my aping.

comment:15 Changed 7 years ago by strk

Do you understand why it doesn't happen in the 2d index ?

comment:16 Changed 7 years ago by pramsey

Dumb luck I think. The first fixed number of bytes are being copied in as a key, it's a garbage key, but it's rarely intersects with other stuff. We could probably find non-crashing bugs related to it, like EMPTY geometries being returned in index searches when they shouldn't because the garbage key happens to intersect the search key. The reason we get a crash in the varlena keys is because the garbage in the varsize slot causes insane memory allocations to occur in the gidx_copy.

Changed 7 years ago by pramsey

Attachment: gidx.patch added

Fixes crash. Any behaviour regressions?

comment:17 Changed 7 years ago by pramsey

The fix was pretty invasive, it touches a bunch of box relationship functions in the index code. The 2D index should be fixed too (#1725) and the 1.5 branch should be patched (#1726).

comment:18 Changed 7 years ago by strk

I like what I read of the patch. Will test shortly. Only, could you please clean up the patch to avoid mixing style and functional changes ? It'd make reading the patch easier.

Nitpicking: I see a duplicated memset on lines 539, 545 of the patch

Hey, where's your public git repo to follow these changes ! :)

comment:19 Changed 7 years ago by strk

Tested, nice! Looking forward to see the 2D index fixed too, so I can do: explain select st_asewkt(g) from test where g ~= 'POINT EMPTY'::geometry;

And get all empty (or otherwise infinite) elements in the table!

More on the memset: is it needed at all ? I see all uses of bbox_out protected by gidx_set_unknown already. Sounds enough.

comment:20 Changed 7 years ago by strk

I've committed your patch to trunk as r9563, after removing the btree changes (unrelated) and dropping the memset. We should then get better testing now, and more easily further improve it.

So, this one is ready for test. @realityexist let us know.

comment:21 Changed 7 years ago by strk

r9564 commits the fix to 1.5 branch, now adding regression tests for both branches.

comment:22 Changed 7 years ago by strk

Resolution: fixed
Status: newclosed

r9567 adds regression testing in trunk, and r9566 in 1.5 branch

comment:23 Changed 7 years ago by realityexists

Resolution: fixed
Status: closedreopened

Tested my application on r9575. It now crashes when inserting a perfectly ordinary geography Point (with a GiST index defined on the column). The point is

ST_GeomFromEWKT('SRID=4326;POINT(-2.7546111 8.0365000)')::geography

Changed 7 years ago by realityexists

Attachment: GiST insert crash.txt added

Crash in r9575 - stack trace

comment:24 Changed 7 years ago by strk

Resolution: fixed
Status: reopenedclosed

Your existing index may be corrupted, drop it and recreate it. If the problem persist please file another bug (this one is about _empty_)

comment:25 Changed 7 years ago by realityexists

Verified fixed in r9580, thanks (#1734 is also fixed).

Note: See TracTickets for help on using tickets.