Opened 13 years ago

Closed 12 years ago

#701 closed enhancement (fixed)

Support for 9.1 KNN GIST enhancements

Reported by: robe Owned by: pramsey
Priority: medium Milestone: PostGIS 2.0.0
Component: postgis Version: master
Keywords: Cc:

Description

KNN GIST support has now been integrated in PostgreSQL GIT repository. Need to make sure we can support it for speedier KNN type queries.

Change History (10)

comment:1 by robe, 13 years ago

Speaking of Hackers — excerpt from recent hacker notes. Hope this inspires you.

On 12/27/10 7:35 PM, Josh Berkus wrote:

On 12/27/10 1:45 PM, Peter Eisentraut wrote:

I'm unable to produce any really "exciting" release notes for alpha3. I have produced a draft here: http://wiki.postgresql.org/wiki/Alpha_release_notes_draft Please edit the bullet points if you have some idea.

I'll see what I can do.

Expanded with my edits on the wiki.

BTW, we really ought to push the PostGIS folks to get coding for, and testing, KNNGiST immediately.

comment:2 by markstos, 13 years ago

Some recent discussion on the Performance mailing list is here:

http://thread.gmane.org/gmane.comp.db.postgresql.performance/28897

It highlights a specific missing feature that would make zipcode-distance searches easier.

comment:3 by nicklas, 13 years ago

Hmm, If I read this part of the conversation right: http://article.gmane.org/gmane.comp.db.postgresql.performance/28905

… it is as I have feared. There seems to be no way to use it for anything but points. If we use it against anything else (except a few exceptions) we will need to do a recheck to see if we really have found the closest geometry and not only the closest bounding box. But what we can get from the KNN gist index is an ordered list. The number of items in the list we have to tell beforehand. We can not come back from our recheck and say, We have reasons to believe that there still might be some candidate out there for being closest, give us some more.

To decide if we really know if we have the closest is no problem if we have found a real mindistance (between the actual geometries, not the bboxes) that is closer than the next bbox in the ordered list.

The problem is if the last bbox is closer than the rechecked closest, the we don't know if the next one (outside our limit) in the ordered list is a candidate.

Does it make sense.

I think that is very bad news, but maybe there is some way to find a workaround.

/Nicklas

comment:4 by darkblueb, 13 years ago

I find this thread misleading.. Points are important.. personally I am not disappointed that GIST with knn internals will work only for points.. in the example of zip codes, take centroids and store them. KnnGIST is a fine addition and I look forward to it when it's ready

comment:5 by pramsey, 13 years ago

First cut is in at r7894.

The ←> operator is now a distance operator. It returns the squared distance between the centroids of the objects.

Example getting 10 nearest named points from a names table.

  SELECT name,ST_AsText(geom) 
  FROM geonames 
  ORDER BY geom <-> ST_SetSRID(ST_MakePoint(-90,40),4326) 
  LIMIT 10;

comment:6 by pramsey, 13 years ago

The <#> operator how provides a box-wise distance operator. It returns the… hm. Unsquared distance. I think I'm going to use the flat distance in both.

comment:7 by markstos, 12 years ago

Thanks for your work on this, pramsey.

I currently use PostgreSQL to run over a million geo-spatial searches per day for "points near a zipcode" using functions like "cube_distance()" and "earth_box()". Below is an example of the kind of method I'm using. Is your work a viable replacement for this use case? If it appears to be, I'm happy to peer review the results vs my current method, and benchmark the two alternatives.

Here's pseudo-SQL for our current use:

 SELECT *
         , cube_distance( center.earth_coords , postal_codes.earth_coords)/$meters_per_mi AS distance
         , 'Miles' AS units
 FROM (
         SELECT 
             earth_coords, 
             earth_box( earth_coords , $radius*$meters_per_mi ) as center_box
         FROM zipcodes WHERE zipcode = '47374' 
      ) as center,
     addresses
     JOIN zipcodes USING (zipcode)
 ORDER BY distance ASC

comment:8 by robe, 12 years ago

Paul — I think this one is done — no? Close it out if it is.

Markstos, We already have this in the PostGIS 2.0 code base planned release around February 2011. You can try it out now to see if it fits your needs.

You may want to take a look at Vizzuality video to see how this comes into play with what they are doing. http://vimeo.com/30158794

Not sure it will work for you offhand unless your searches are a user input point. It doesn't look like your query is written that way but that may be more to do with compromises using what you had. Note the limitation is that one of the geometries has to be a constant.

As far as stuff needing peer review. The other GIST enhancement slated for 9.2 release is what we badly need review help/and or funding on. http://www.postgresonline.com/journal/archives/225-Improving-speed-of-GIST-indexes-in-PostgreSQL-9.2.html

comment:9 by markstos, 12 years ago

Thanks for the reply, robe. It's great news that this appears ready to try out with PostgreSQL 9.1.

In reviewing the related docs, I did find one small typo that can be fixed. Search for this in the docs to find and correct it:

"taht use spatial"

(should be "that").

Once I have a chance to test this feature for a zipcode searching application, I'll share my results.

comment:10 by pramsey, 12 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.