Opened 4 years ago

Closed 4 years ago

#4686 closed enhancement (fixed)

Faster bbox access for PG12+

Reported by: Algunenano Owned by: Algunenano
Priority: medium Milestone: PostGIS 3.1.0
Component: postgis Version: master
Keywords: Cc:

Description

We currently have to ways to access the bbox of a geometry from "disk":

  • gserialized_get_gbox_p: Tries to access the bbox, if it doesn't exist, deserializes the geometry and calculates it.
  • gserialized_fast_gbox_p: Tries to access the bbox, if it doesn't exist, returns NULL.

In both cases if the datum was toasted, it has been detoasted (which is expensive).

I think (not tested performance impact yet) that there are multiple cases were we could benefit from following a different approach (similar to #4676):

  • Starting with the Datum.
  • Read the header only.
  • Check the bbox from the header. If available, return it (huge win for toasted value).
  • If we really want the bbox (first case), then if there is more data available in the datum* detoast it fully, deserialize it and calculate it.

The possible benefits from this are the callers to those 2 functions, including the ANALYZE calls (compute_gserialized_stats_mode).

  • I'd say most of the time there isn't any more data to read. The header considers a 4d bbox, that is 8 floats (8*32b = 256bits), and the extended flags (64b, which are seldom used). This is enough space for a single 4D point or 2 2D points. In any case, we could artificially increase the size request for the header if we wanted to.

Change History (4)

comment:1 by Algunenano, 4 years ago

A function that could benefit from something like this would be ST_ClipByBox2D which has 2 different shortcuts (geom bbox inside box, geom bbox fully outsize the bbox).

One of the common use cases for ST_ClipByBox2D is to clip the webmercator world before transforming geometries into 3857, and you expect most of the source geoemtries to fall inside the box.

comment:2 by Algunenano, 4 years ago

I've been having a look and there are only a few places where I was able to use this:

  • ANALYZE calls.
  • _ST_PointOutside (I did this just to confirm it was useful more than anything).
  • postgis_getbbox
  • ST_CombineBbox(box2d, geometry)
  • ST_ClipByBox2d when the geom is either fully outside or fully inside the bbox.
  • ST_BoundingDiagonal

Obviously this only affects compressed geometries.

An example of the impact:

-- 13 big polygons (~260843 vertices on avg)
ANALYZE benchmark_4c7214d90a79aa6760367a084a4d4a2f61fbe1c6cc4f7f9e76020  (the_geom, the_geom_webmercator);
-- Master: 67.960
-- Faster bbox: 2.315

It's currently living in https://github.com/Algunenano/postgis/tree/faster_box. I'll create a PR next week with some more numbers.

comment:3 by Algunenano, 4 years ago

PR: https://github.com/postgis/postgis/pull/572

Numbers:

  • ANALYZE:
------------------ POINTS -------------------
-- ANALYZE benchmark_7773a711c8441d4b494a51fd9feebeac7a9b9c734619398620293 (the_geom, the_geom_webmercator);
-- Master: 116.324
-- Faster bbox: 117.262
-- Perf: 0.99x

---------------- BIG POLYGONS -------------------
-- ANALYZE benchmark_4c7214d90a79aa6760367a084a4d4a2f61fbe1c6cc4f7f9e76020  (the_geom, the_geom_webmercator);
-- Master: 70.258
-- Faster bbox: 2.339
-- Perf: 30.03x
  • ST_PointOutside:
--------------- BIG POLYGONS ----------------
---------------------------------------------
-- EXPLAIN ANALYZE Select _ST_PointOutside(the_geog) from test_geog;
-- Master: 16.925
-- Faster bbox: 0.158
-- Perf: 107.12x
  • ST_CombineBbox:
---------------------------------------------
-------------- SMALL POLYGONS ---------------
EXPLAIN ANALYZE Select ST_CombineBbox(NULL::box2d, the_geom) from benchmark_af85af967525e69373b7b8678f3bd409a8e78f447671344b7535d;
-- Master: 290.640
-- Faster bbox: 311.152
-- Perf: 0.93x

-------------- BIG POLYGONS ----------------
-- EXPLAIN ANALYZE Select ST_CombineBbox(NULL::box2d, the_geom) from benchmark_4c7214d90a79aa6760367a084a4d4a2f61fbe1c6cc4f7f9e76020;
-- Master: 17.288
-- Faster bbox: 0.165
-- Perf: 104.77x
  • ST_ClipByBox2D (fully inside):
-------------- SMALL POLYGONS ---------------
-- EXPLAIN ANALYZE Select count(*) FROM benchmark_af85af967525e69373b7b8678f3bd409a8e78f447671344b7535d WHERE ST_ClipByBox2D(the_geom_webmercator, ST_TileEnvelope(0,0,0)) IS NOT NULL;
-- Master: 155.528
-- Faster bbox: 129.040
-- Perf: 1.21x

-------------- BIG POLYGONS ----------------
-- EXPLAIN ANALYZE Select count(*) FROM benchmark_4c7214d90a79aa6760367a084a4d4a2f61fbe1c6cc4f7f9e76020 WHERE ST_ClipByBox2D(the_geom_webmercator, ST_TileEnvelope(0,0,0)) IS NOT NULL;
-- Master: 19.266
-- Faster bbox: 0.220
-- Perf: 87.39x
  • ST_ClipByBox2D (fully outside):
-------------- SMALL POLYGONS ---------------
-- EXPLAIN ANALYZE Select count(*) FROM benchmark_af85af967525e69373b7b8678f3bd409a8e78f447671344b7535d WHERE ST_ClipByBox2D(the_geom_webmercator, ST_TileEnvelope(20,0,0)) IS NOT NULL;
-- Master: 171.451
-- Faster bbox: 147.365
-- Perf: 1.16x

-------------- BIG POLYGONS ----------------
-- EXPLAIN ANALYZE Select count(*) FROM benchmark_4c7214d90a79aa6760367a084a4d4a2f61fbe1c6cc4f7f9e76020 WHERE ST_ClipByBox2D(the_geom_webmercator, ST_TileEnvelope(20,0,0)) IS NOT NULL;
-- Master: 19.265
-- Faster bbox: 0.226
-- Perf: 85.24x
  • ST_BoundingDiagonal (fully outside):
-------------- SMALL POLYGONS ---------------
-- EXPLAIN ANALYZE Select ST_BoundingDiagonal(the_geom) from benchmark_af85af967525e69373b7b8678f3bd409a8e78f447671344b7535d;
-- Master: 566.389
-- Faster bbox: 446.435
-- Perf: 1.26x

-------------- BIG POLYGONS ----------------
-- EXPLAIN ANALYZE Select ST_BoundingDiagonal(the_geom) from benchmark_4c7214d90a79aa6760367a084a4d4a2f61fbe1c6cc4f7f9e76020;
-- Master: 19.279
-- Faster bbox: 0.153
-- Perf: 126x

comment:4 by Raúl Marín <git@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 90a15b1/git:

Adjust SQL function costs

Closes #4686
Closes https://github.com/postgis/postgis/pull/572

Note: See TracTickets for help on using tickets.