Opened 10 years ago

Last modified 7 years ago

#2898 new enhancement

Define a guideline for function cost

Reported by: strk Owned by: strk
Priority: medium Milestone: PostGIS Fund Me
Component: postgis Version: master
Keywords: Cc:

Description

A guideline could help with defining the cost of each function signature, to help with cases like #2895 and #2896.

Such guideline could written in the top-level .sql.in files of each component (core, raster, topology, …).

Change History (10)

comment:1 by strk, 10 years ago

Example guideline for rasters:

-- COST assignment guideline (TODO):                                            
--                                                                              
-- By default C functions get assigned a cost of 1 while (plpg)sql functions    
-- get assigned a cost of 100. Value expressed in "cpu_operator_cost" units.    
-- See http://www.postgresql.org/docs/9.1/static/runtime-config-query.html#GUC-CPU-OPERATOR-COST
--                                                                              
-- The following schema can be used as a guideline to set cost of each          
-- function. The multiplier would apply to the language based default           
-- (1 or 100) and to the number of raster arguments.                            
--                                                                              
-- For example a multiplier of 500 for a C function taking 2 rasters            
-- and performing operations on each of their pixel would have a cost           
-- of 1000.                                                                     
--                                                                              
--  o Only detoast raster header                                                
--    x 1                                                                       
--                                                                              
--  o Detoast whole raster                                                      
--    x 100                                                                     
--                                                                              
--  o Scan each raster pixel                                                    
--    x 300                                                                     
--                                                                              
--  o Perform operations on each raster pixel                                   
--    x 500                                  

comment:2 by strk, 10 years ago

The guideline example above is completely off, though. Here are some real numbers:

          noop : Total runtime: 19523.492 ms (full deserialize/serialize)       
   st_envelope : Total runtime: 18411.870 ms (header slice)                     
 st_convexhull : Total runtime: 18104.542 ms                                    
         plain : Total runtime: 51.341 ms (???) 

The times were taken running queries against a ~180k rows raster coverage. The "plain" is a "SELECT rast FROM …". The "noop" is a "SELECT postgis_noop(rast) FROM …".

The difference between "plain" and "noop" is impressive!!

comment:3 by strk, 10 years ago

If I change the postgis_noop implementation to simply return the input datum it takes 66ms (vs. ~50ms of the plain timing). Having it simply do the DETOAST_DATUM makes it jump again to 18149.952 ms

It turns out EXPLAIN ANALYZE does not run canonical output functions at all, so there's no detoasting performed in a "SELECT rast FROM …"

comment:4 by strk, 10 years ago

Alright so in the light of the new findings, sliced detoasting (header only) and full detoasting do not make such a big difference. It is detoasting itself that hurts. Guidelines should then not take that part much in consideration, other than a COST offset common to each and every function.

Besides, I suspect that the planner should be considering cost of detoasting on itself, as the same function could be invoked with the input already detoasted…

comment:5 by strk, 10 years ago

17:37 < RhodiumToad> the planner doesn't take it into account. 17:38 < RhodiumToad> currently, functions that may be highly expensive due to detoasting (to_tsvector being a notorious example) often need to be assigned large costs to avoid bad plans

comment:6 by dustymugs, 10 years ago

The numbers you're seeing generally match what I've seen with in-db rasters on production servers. For basic ops, the detoast is the costliest.

comment:7 by strk, 10 years ago

The question is should we consider detoasting costs in the function definition or shouldn't we ? I'd think the planner should consider them itself as we cannot know its cost at definition time for two reasons:

  1. It depends on input (toast) size
  2. It depends on whether the input gets to us already detoasted

comment:8 by pramsey, 10 years ago

Owner: changed from pramsey to strk

From the commentary, it sounds like this one is not as "straightforward" as one might hope, if input data size is the primary determinant of cost. Maybe close.

comment:9 by strk, 10 years ago

Milestone: PostGIS 2.2.0PostGIS Future

comment:10 by robe, 7 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

Note: See TracTickets for help on using tickets.