Opened 11 years ago
Closed 11 years ago
Last modified 11 years ago
#1697 closed defect (fixed)
Crash in GiST index with empty MultiPolygons
|Reported by:||realityexists||Owned by:||pramsey|
2.0.0 rev 9517, 32-bit, PostgreSQL 9.1.3
I have a table with a geography(MultiPolygon) column and a GiST index on that column. For each new row my application initially inserts 'MULTIPOLYGON EMPTY' into that column, then updates it to the real (non-empty) value. After some number of successful inserts any further inserts crash with the attached stack trace. If I insert a different dummy polygon (not empty) everything works fine.
I wasn't able to create a simple repro for this, so I suspect it takes some amount of data for the index to get into a bad state. The stack trace is from Linux, but this also happens on Windows (beta 3 binaries).
The client gets messages like this when trying to insert another value (once a crash has occurred):
NOTICE: [geography_inout.c:gserialized_geography_from_lwgeom:117] typmod was -1 LINE 22: ...(E'SA')), ((E'FL')), ((245)), ((E'MULTIPOL... ^ NOTICE: [gserialized_typmod.c:postgis_valid_typmod:115] Entered function NOTICE: [gserialized_typmod.c:postgis_valid_typmod:120] Got geom(type = 6, srid = 4326, hasz = 0, hasm = 0) NOTICE: [gserialized_typmod.c:postgis_valid_typmod:121] Got typmod(type = 6, srid = 4326, hasz = 0, hasm = 0) NOTICE: [gserialized_gist_nd.c:gserialized_gist_compress:488] [GIST] 'compress' function called NOTICE: [gserialized_gist_nd.c:gserialized_gist_compress:500] [GIST] processing leafkey input NOTICE: [gserialized_gist.c:gserialized_datum_get_gidx_p:241] entered function NOTICE: [gserialized_gist.c:gserialized_datum_get_gidx_p:250] got flags 8 NOTICE: [gserialized_gist.c:gserialized_datum_get_gidx_p:270] could not calculate bbox, returning failure NOTICE: [gserialized_gist_nd.c:gserialized_gist_compress:522] [GIST] empty geometry! NOTICE: [gserialized_gist_nd.c:gserialized_gist_picksplit:1016] [GIST] 'picksplit' function called ********** Error **********
Change History (28)
by , 11 years ago
|Attachment:||Empty MultiPolygon crash stack.txt added|
comment:1 by , 11 years ago
comment:2 by , 11 years ago
This didn't generate any problem, even run 1000s of times:
insert into test (g) values ('MULTIPOLYGON EMPTY'); update test set g = 'MULTIPOLYGON(((0 0, 0 1, 1 1, 1 0, 0 0)))' where i = currval('test_i_seq');
comment:3 by , 11 years ago
OK, this is fully replicable, just create a table with a geography column and insert empty geometries until it crashes when it gets to the first split.
comment:4 by , 11 years ago
No bug for geometry columns with a 2D index.
comment:5 by , 11 years ago
No bug for geometry columns with an ND index? Huh? that's the same code as geography uses…
comment:6 by , 11 years ago
But geography definitely crashes…
create table test (g geography); create index gt on test using gist (g); insert into test (g) select 'POLYGON EMPTY'::geography from generate_series(1,300);
comment:7 by , 11 years ago
So, several hours of debugging later, no real change. The problem happens in picksplit, it happens right at the start, as PostgreSQL hands in the user code a bogus box_current. The box contains a crazy size header which leads to a failed palloc in the gidx_copy function, and finally the failed memcpy that the stacktrace reports. None of this happens if you use geometry instead of geography but still use an nd index. Same code, same input objects, very different results.
I thought perhaps the gist compress function (which all the objects traverse as they are added to the node) might do something different between geometry and geography, but it doesn't. In both cases, compress finds no calculateable bounding box (everything is empty) so it returns back the input.
comment:8 by , 11 years ago
Aha, I can now make the geometry ND index crash too, by putting an SRID on the geometry.
create table test (g geometry); create index gt on test using gist (g gist_geometry_ops_nd); insert into test (g) select st_setsrid('POLYGON EMPTY'::geometry,4326) from generate_series(1,300);
I still don't know why the crash happens, but the geography-vs-geometry part is gone, this is a problem with the ND index and empty gserialized that have srids on them.
comment:9 by , 11 years ago
When a key is passed into gist compress, you are expected to compress it and return a key suitable for insertion into the index. For null, the gist superstructure actually bypasses the user function, so our null handling never even gets called. For non-null, the expectation is that we'll return something sensible, that is to say, a key we will be able to handle correctly later on.
Instead, when we hit the empty case, we're just echoing back the input key datum, so it's a garbage key. Any sensible behavior we have gotten with empties so far is blind luck. Probably because we don't care about them much, and don't seem to insert them in an index often or at all.
We need to either (a) figure out how to handle them like nulls (just blindly copying the old user-function null handling (which as noted above, doesn't actually get called) over resulted in crashes, so more research there is required) or (b) actually add an "empty box" semantic and deal with how to search over empty boxes properly. For (b) one possibility might be to generate a (Inf, Inf, Inf, Inf) box, that won't intersect anything but itself (I think), however, that could also end up unbalancing trees nastily.
comment:10 by , 11 years ago
Is there a simple solution that won't screw up something bigger or can we punt this to 2.0.1 and just make a note of it in 2.0.0?
comment:11 by , 11 years ago
I think the "right" solution is to get the index to treat the empties as nulls, but I'm currently uncertain as to how to do that and wlll be doing opengeo and conference stuff for the next couple weeks. This problem should also exist in 1.5, so it's not a new exciting crasher, it's been around for a while and just get excavated recently, so I'd say there's no shame in pushing it forward.
comment:12 by , 11 years ago
Sounds useful: NOTICE: [gserialized_gist_nd.c:gidx_copy:89] copying gidx (0x7f8d276adfa0) of size 964952070 to gidx (0x7f8cec988048)
Aren't we returning memory allocated in a local variable, btw ???!!!
char gidxmem[GIDX_MAX_SIZE]; GIDX *bbox_out = (GIDX*)gidxmem;
You can't safely return bbox_out then
comment:13 by , 11 years ago
I think a non-crashing fix would be useful for 2.0. Don't understand why you can't crash the 2d index. Maybe it's because GBOX is varsize.
comment:14 by , 11 years ago
Regarding memory and so on, bbox_out isn't returned, a gist entry is returned, and in cases where a box is embedded in the gist entry as a key, the box is copied before returning. I carefully set this up to avoid global allocations, don't worry about that stuff. The problem is not there, it's in how we handle empties in the compress routines.
Regarding fixing, now that I know where the problem is, fixing becomes quite possible, but will take some time to try different things and test. As noted before, this problem has existed in 1.5 for a long time in the geography type, so I don't think we should necessarily block on it unless we really really want to.
What surprises me is when I try to return an empty gist entry, aping the null handling, I get a crash, so I must be doing something wrong in my aping.
comment:15 by , 11 years ago
Do you understand why it doesn't happen in the 2d index ?
comment:16 by , 11 years ago
Dumb luck I think. The first fixed number of bytes are being copied in as a key, it's a garbage key, but it's rarely intersects with other stuff. We could probably find non-crashing bugs related to it, like EMPTY geometries being returned in index searches when they shouldn't because the garbage key happens to intersect the search key. The reason we get a crash in the varlena keys is because the garbage in the varsize slot causes insane memory allocations to occur in the gidx_copy.
by , 11 years ago
Fixes crash. Any behaviour regressions?
comment:17 by , 11 years ago
comment:18 by , 11 years ago
I like what I read of the patch. Will test shortly. Only, could you please clean up the patch to avoid mixing style and functional changes ? It'd make reading the patch easier.
Nitpicking: I see a duplicated memset on lines 539, 545 of the patch
Hey, where's your public git repo to follow these changes !
comment:19 by , 11 years ago
Tested, nice! Looking forward to see the 2D index fixed too, so I can do: explain select st_asewkt(g) from test where g ~= 'POINT EMPTY'::geometry;
And get all empty (or otherwise infinite) elements in the table!
More on the memset: is it needed at all ? I see all uses of bbox_out protected by gidx_set_unknown already. Sounds enough.
comment:20 by , 11 years ago
I've committed your patch to trunk as r9563, after removing the btree changes (unrelated) and dropping the memset. We should then get better testing now, and more easily further improve it.
So, this one is ready for test. @realityexist let us know.
comment:21 by , 11 years ago
r9564 commits the fix to 1.5 branch, now adding regression tests for both branches.
comment:22 by , 11 years ago
|Status:||new → closed|
comment:23 by , 11 years ago
|Status:||closed → reopened|
Tested my application on r9575. It now crashes when inserting a perfectly ordinary geography Point (with a GiST index defined on the column). The point is
by , 11 years ago
|Attachment:||GiST insert crash.txt added|
Crash in r9575 - stack trace
comment:24 by , 11 years ago
|Status:||reopened → closed|
Your existing index may be corrupted, drop it and recreate it. If the problem persist please file another bug (this one is about _empty_)
You're dying in picksplit, so the number of inserts necessary is related to having to get the the node size up to the maximum before split is forced. You should be able to generate it repeatedly with a workload… could you attach the workload or post it online? You allude to it, but a more precise description would be useful. Thanks.