Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#2953 closed defect (fixed)

Unable to compute stats for some features

Reported by: pramsey Owned by: pramsey
Priority: high Milestone: PostGIS 2.1.5
Component: postgis Version: 2.1.x
Keywords: stats, history Cc:

Description

It looks like some input data might include extremely small M values as a default value

 POINT ZM (-124.22007 41.75967 0 -1.79769313486232e+308)

A table of these things will result in the following when ANALYZE is run

NOTICE: no non-null/empty features, unable to compute statistics

The problem is that the large M value is so large that when it's converted to a float for storage in gserialized, it actually goes to Inf, so that when the box is retrieved from the serialization, the result is this:

(GBOX) $7 = (flags = '\x03', 
 xmin = -124.22007751464844, xmax = -124.22006988525391, 
 ymin = 41.759666442871094, ymax = 41.759670257568359, 
 zmin = 0, zmax = 0, 
 mmin = -Inf, mmax = -Inf)

Attachments (1)

2953.patch (628 bytes) - added by pramsey 5 years ago.
fix based on pushing huge doubles values to FLT_MAX/FLT_MIN

Download all attachments as: .zip

Change History (5)

Changed 5 years ago by pramsey

Attachment: 2953.patch added

fix based on pushing huge doubles values to FLT_MAX/FLT_MIN

comment:1 Changed 5 years ago by pramsey

So, my fix just catches cases where a big double gets converted into an Inf float, and instead converts them into MAX_FLT/MIN_FLT (depending on whether it's positive or negative). I feel like that's "OK", in a practical sense, since we're talking about values > 1038 here.

Another solution would be to try and fit the estimation code to still fail in cases of Inf boxes, but instead to fail a bit more selectively, since in the case of this example it would be possible to compute a 2D stats histogram, just not a 4D one. Unfortunately a lot of the code works in the ND case and the 2D case is just a small specialization.

comment:2 Changed 5 years ago by pramsey

Resolution: fixed
Status: newclosed

I put in a different, more "correct" solution at r13030 on trunk, r13031 on 2.1.

Rather than force the box into a "valid" space in all cases, I just trip the higher dimensions when in the 2d mode. This results in good stats for 2d, and bad stats of n-d, which fixes 95% of the use cases. If you're building an nd-index on this crazy Z/M data, then you can deal with the badness.

comment:3 Changed 5 years ago by strk

Reviewed. Sounds good. Maybe the comment could be more specific about what "safety" it's after. It looks to me it is just a way to only consider 2D when checking for validity (ie: a replacement for a missing gbox_is_valid2d). Is that correct ?

comment:4 Changed 4 years ago by robe

Keywords: history added
Note: See TracTickets for help on using tickets.