Opened 10 years ago
Closed 10 years ago
#3058 closed defect (fixed)
picksplit method for column 1 of index "nd_geometry_idx" failed
Reported by: | strk | Owned by: | strk |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 2.0.7 |
Component: | postgis | Version: | 2.0.x |
Keywords: | Cc: |
Description
Spotted during ND index testing:
DEBUG: switched to buffered GiST build; level step = 1, pagesPerBuffer = 305 DEBUG: picksplit method for column 1 of index "tweets_g_idx" failed HINT: The index is not optimal. To optimize it, contact a developer, or try to use the column as the second one in the CREATE INDEX command.
Sounds like a bug, is it ?
Index is being created on a table with 35,128,772 4D points and extent (we need an ND box type to output this…):
-- X: ~ -18e6 to 18e6 -- Y: ~ 0 to 1.5e6 -- Z: ~ 0 to 100 -- M ~ 2 to 315e6
Change History (12)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
comment:3 by , 10 years ago
It may be related an unexpected timing I'm getting with a test query over a 35 millions rows table where 4 different filters, each selecting under 10 rows from the set, all take less than 15ms but one that takes ~650ms. Sounds like the possible effect of an unbalanced tree, doesn't it ?
comment:4 by , 10 years ago
I've added more detail to the PostgreSQL log:
DEBUG: picksplit method for column 1 of index "tweets_g_nd" failed (141 elems on right, none on the other side)
comment:5 by , 10 years ago
The message is always the same (141 elems all going on the right side), GiST gets 4 levels deep
comment:6 by , 10 years ago
I'd check and see if that condition is being caused at our end (we're sending back a split w/ 0 on one side) or their side (when we fall back to them, they split and end up w/ a 0 entry split)
comment:8 by , 10 years ago
I believe it is _us_ setting 0 entries in one side:
DEBUG: switched to buffered GiST build; level step = 1, pagesPerBuffer = 305 NOTICE: YYY below:2, above:0 NOTICE: XXX FOR d:2, below:0, above:141 ERROR: XXX direction:2, below: 0, above: 141, posmax:141
The gserialized_gist_picksplit_badratios does not return TRUE for such configuration (maybe because it only does so if _all_ of the dimensions have that problem?). But then when looking for the plan giving us the most "even ratio" the code looks suspicious, really only checking for the max number of items in any of the two sides. Evidently 141 happens to be the biggest number, and that plane is picked, which has 0 on the other side:
/* ** Now, what splitting plane gives us the most even ratio of ** entries in our child pages? Since each split region has been apportioned entries ** against the same number of total entries, the axis that has the smallest maximum ** number of entries in its regions is the most evenly distributed. ** TODO: what if the distributions are equal in two or more axes? */ for ( d = 0; d < ndims_pageunion; d++ ) { int posd = Max(pos[ABOVE(d)],pos[BELOW(d)]); if ( posd > posmax ) { direction = d; posmax = posd; } if ( pos[BELOW(d)] == 0 || pos[ABOVE(d)] == 0 ) { lwnotice("XXX FOR d:%d, below:%d, above:%d", d, pos[BELOW(d)], pos[ABOVE(d)]); } }
Should the difference between ABOVE and BELOW be considered instead, and the _minimum_ difference be taken as the best split ?
comment:9 by , 10 years ago
I confirm looking for the _minimum_ among the dimensional-maxes ( posd < posmax ) fixes the issue. The result in my case seems to be also producing a slightly smaller index, but other than that I'm not sure how to compare the efficiency of it.
comment:10 by , 10 years ago
Actually, I could also test performance as I have the old index built and could compare with the old and with the new.
Same queries are run against the same data, in one case the old index is used (removing the new) and in the other case the new index is used (removing the old). Indexes removal happens in a transaction which is rolled back afterwards, so nothing is lost. Plans have been checked to all use an index scan.
Runs with old index:
Total runtime: 18.324 ms Total runtime: 16.792 ms Total runtime: 635.775 ms Total runtime: 11.026 ms Total runtime: 15.693 ms
Runs with new index:
Total runtime: 16.001 ms Total runtime: 31.227 ms Total runtime: 192.585 ms Total runtime: 11.787 ms Total runtime: 26.770 ms
comment:11 by , 10 years ago
Milestone: | → PostGIS 2.0.7 |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Version: | trunk → 2.0.x |
comment:12 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Reading the PostgreSQL code, that message is printed when the picksplit method puts 0 elements on either sides of the split.