Opened 4 years ago

Closed 4 years ago

#4674 closed defect (fixed)

lwgeom_cache doesn't live through not inline functions

Reported by: Algunenano Owned by: pramsey
Priority: medium Milestone: PostGIS 3.1.0
Component: postgis Version: master
Keywords: Cc:

Description

While I was working on #4672, I noticed that the ST_AsGML cache wasn't working all the time (in fact it was working with geographies but not geometries).

After adding some logs, it seems that if the function is inlined, the cache (installed in the parent memory context) lives for the next row, but if it isn't then you loose the cache and everything needs to be recreated again.

Example that isn't inlined:

# explain (analyze, verbose) Select ST_AsGML(the_geom::geometry) from benchmark_7773a711c8441d4b494a51fd9feebeac7a9b9c734619398620293 limit 4;NOTICE:  GetGenericCacheCollection not found
NOTICE:  generic_cache not found
NOTICE:  Cache not found. Expected: 4326 1. Found: 0 0 null
NOTICE:  GetGenericCacheCollection not found
NOTICE:  generic_cache not found
NOTICE:  Cache not found. Expected: 4326 1. Found: 0 0 null
NOTICE:  GetGenericCacheCollection not found
NOTICE:  generic_cache not found
NOTICE:  Cache not found. Expected: 4326 1. Found: 0 0 null
NOTICE:  GetGenericCacheCollection not found
NOTICE:  generic_cache not found
NOTICE:  Cache not found. Expected: 4326 1. Found: 0 0 null
                                                                                   QUERY PLAN                                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1.25 rows=4 width=32) (actual time=0.780..2.497 rows=4 loops=1)
   Output: (st_asgml(the_geom, 15, 0))
   ->  Seq Scan on public.benchmark_7773a711c8441d4b494a51fd9feebeac7a9b9c734619398620293  (cost=0.00..213260.58 rows=683833 width=32) (actual time=0.778..2.492 rows=4 loops=1)
         Output: st_asgml(the_geom, 15, 0)
 Planning Time: 0.213 ms
 Execution Time: 2.534 ms
(6 rows)

Example that is inlined:

# explain (analyze, verbose) Select ST_AsGML(3, the_geom::geometry) from benchmark_7773a711c8441d4b494a51fd9feebeac7a9b9c734619398620293 limit 4;
NOTICE:  GetGenericCacheCollection not found
NOTICE:  generic_cache not found
NOTICE:  Cache not found. Expected: 4326 1. Found: 0 0 null
NOTICE:  Cache FOUND. Expected: 4326 1
NOTICE:  Cache FOUND. Expected: 4326 1
NOTICE:  Cache FOUND. Expected: 4326 1
                                                                                    QUERY PLAN                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..10.25 rows=4 width=32) (actual time=0.418..0.535 rows=4 loops=1)
   Output: (_st_asgml(3, the_geom, 15, 0, NULL::text, NULL::text))
   ->  Seq Scan on public.benchmark_7773a711c8441d4b494a51fd9feebeac7a9b9c734619398620293  (cost=0.00..1751884.83 rows=683833 width=32) (actual time=0.417..0.533 rows=4 loops=1)
         Output: _st_asgml(3, the_geom, 15, 0, NULL::text, NULL::text)
 Planning Time: 0.152 ms
 Execution Time: 0.562 ms
(6 rows)

I'm not sure if this affects more functions (probably anything doing indirect calls within SQL is susceptible to this behaviour) but it's indeed a PITA.

Is there a query context that we can hook us to that lives for the whole duration of the query (instead of a transaction / function)?

Change History (10)

comment:1 by pramsey, 4 years ago

Even if we put the cache into TopTransactionContext or something like that, there's no way we could necessarily reach it again if fn_extra is getting wiped out. I don't think there's anything we can do.

comment:2 by Algunenano, 4 years ago

What if we don't store the pointer to GenericCacheCollection it in fn_extra? We could have a static pointer to the cache and register a callback to nullify the pointer when the context is deleted, right?

We do something similar with the projection cache (to call the PROJ destructor when the context is being removed).

comment:3 by Algunenano, 4 years ago

I've made it work using CurTransactionContext but that only free's the cache at the end of the full transaction.

For example, using a BEGIN-END group:

cartodb_dev_user_f80dfdef-ea5c-498d-b98a-82362650d944_db=# BEGIN;
BEGIN
cartodb_dev_user_f80dfdef-ea5c-498d-b98a-82362650d944_db=# 
cartodb_dev_user_f80dfdef-ea5c-498d-b98a-82362650d944_db=# explain (analyze, verbose) Select ST_AsGML(the_geom) from benchmark_7773a711c8441d4b494a51fd9feebeac7a9b9c734619398620293 limit 4;
NOTICE:  GetGenericCacheCollection NOT FOUND
NOTICE:  generic_cache not found
NOTICE:  Cache not found. Expected: 4326 1. Found: 0 0 null
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  Cache FOUND. Expected: 4326 1
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  Cache FOUND. Expected: 4326 1
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  Cache FOUND. Expected: 4326 1
                                                                                   QUERY PLAN                                
                                                    
-----------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------
 Limit  (cost=0.00..1.25 rows=4 width=32) (actual time=1.112..1.417 rows=4 loops=1)
   Output: (st_asgml(the_geom, 15, 0))
   ->  Seq Scan on public.benchmark_7773a711c8441d4b494a51fd9feebeac7a9b9c734619398620293  (cost=0.00..213260.58 rows=683833 
width=32) (actual time=1.110..1.414 rows=4 loops=1)
         Output: st_asgml(the_geom, 15, 0)
 Planning Time: 0.459 ms
 Execution Time: 1.443 ms
(6 rows)

cartodb_dev_user_f80dfdef-ea5c-498d-b98a-82362650d944_db=# 
cartodb_dev_user_f80dfdef-ea5c-498d-b98a-82362650d944_db=# explain (analyze, verbose) Select ST_AsGML(3, the_geom::geometry) from benchmark_7773a711c8441d4b494a51fd9feebeac7a9b9c734619398620293 limit 4;
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  Cache FOUND. Expected: 4326 1
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  Cache FOUND. Expected: 4326 1
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  Cache FOUND. Expected: 4326 1
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  Cache FOUND. Expected: 4326 1
                                                                                    QUERY PLAN                               
                                                     
-----------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------
 Limit  (cost=0.00..10.25 rows=4 width=32) (actual time=0.096..0.273 rows=4 loops=1)
   Output: (_st_asgml(3, the_geom, 15, 0, NULL::text, NULL::text))
   ->  Seq Scan on public.benchmark_7773a711c8441d4b494a51fd9feebeac7a9b9c734619398620293  (cost=0.00..1751884.83 rows=683833
 width=32) (actual time=0.095..0.270 rows=4 loops=1)
         Output: _st_asgml(3, the_geom, 15, 0, NULL::text, NULL::text)
 Planning Time: 0.110 ms
 Execution Time: 0.293 ms
(6 rows)

cartodb_dev_user_f80dfdef-ea5c-498d-b98a-82362650d944_db=# 
cartodb_dev_user_f80dfdef-ea5c-498d-b98a-82362650d944_db=# explain (analyze, verbose) Select ST_Transform(the_geom, 3857), ST_Transform(the_geom, 3857) from benchmark_7773a711c8441d4b494a51fd9feebeac7a9b9c734619398620293 limit 4;
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  GetGenericCacheCollection FOUND!!!!!
NOTICE:  GetGenericCacheCollection FOUND!!!!!
                                                                                    QUERY PLAN                               
                                                     
-----------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------
 Limit  (cost=0.00..20.25 rows=4 width=64) (actual time=7.136..7.202 rows=4 loops=1)
   Output: (st_transform(the_geom, 3857)), (st_transform(the_geom, 3857))
   ->  Seq Scan on public.benchmark_7773a711c8441d4b494a51fd9feebeac7a9b9c734619398620293  (cost=0.00..3461467.33 rows=683833
 width=64) (actual time=7.135..7.201 rows=4 loops=1)
         Output: st_transform(the_geom, 3857), st_transform(the_geom, 3857)
 Planning Time: 0.040 ms
 Execution Time: 7.233 ms
(6 rows)

cartodb_dev_user_f80dfdef-ea5c-498d-b98a-82362650d944_db=# 
cartodb_dev_user_f80dfdef-ea5c-498d-b98a-82362650d944_db=# END;
NOTICE:  PostgisResetInternalCache
COMMIT
cartodb_dev_user_f80dfdef-ea5c-498d-b98a-82362650d944_db=#

Notice how END resets the cache.

I think it would be best if the cache only lives until the end of a single query (and not the full transaction) but I haven't found out how to do it yet.

comment:4 by Algunenano, 4 years ago

I've been looking for it but I don't see any simple way to get the memory context of a single query.

So let's describe the 2 options:

  • Use flinfo->fn_mcxt
    • Cache is done per function call.
    • Calling Select ST_Transform(g, 3857), ST_Transform(g2, 3857) from table will do 2 cache instantiations.
    • Calling the previous query 2 times inside a transaction (BEGIN; query; query; END;) will do 2 x 2 cache instantiations.
    • Doesn't work across pure SQL functions (unless inlined).
  • Use CurTransactionContext.
    • Cache is done per transaction.
    • Calling Select ST_Transform(g, 3857), ST_Transform(g2, 3857) from table will do only 1 instantiation.
    • Calling the previous query 2 times inside a transaction (BEGIN; query; query; END;) will create the cache only once (no matter the amount of times you call the query).
    • Works across "everything", as is kept alive until the end of the transaction.

This second approach can be considered more dangerous in case of a leak in the cached objects (as things won't necessarily be deleted immediately) but considering that the boundaries of all the different caches are well established I don't expect it to be an issue.

There is a possible "issue" related to doing a transaction with a ST_Transform, then a modification to spatial_ref_sys, and then another ST_Transform and expect it to be different (but it isn't because it is cached). After giving it some thought I don't consider this a problem since the functions are all labelled as IMMUTABLE, so I don't see a problem with requesting different transactions if you really want this kind of behaviour.

What do you all think? If I don't hear big cries I'll clean up the patch (and break it apart from the changes for #4672) and push it. In any case I'll leave things to be easily reverted (simply change CurTransactionContext to flinfo->fn_mcxt).

comment:5 by Raúl Marín <git@…>, 4 years ago

Resolution: fixed
Status: newclosed

In 3ced96fc/git:

Internal cache now persists until the end of the transaction

Closes #4674

comment:6 by Algunenano, 4 years ago

Resolution: fixed
Status: closedreopened

Reopening after a discussion in Github. I intend to revert this change and find a better way to keep the ST_AsGML optimization (by removing the inline SQL).

comment:7 by Algunenano, 4 years ago

I tried reverting this change but it seems that fcinfo->flinfo might be NULL in some cases and it's crashing with the new geojson cache:

(gdb) bt
#0  0x00007f24776bda8a in GetGenericCacheCollection (fcinfo=0x7ffe6199df80) at lwgeom_cache.c:79
#1  SRIDCacheGet (fcinfo=0x7ffe6199df80) at lwgeom_cache.c:506
#2  GetSRIDCacheBySRS (fcinfo=0x7ffe6199df80, srs=0x558057f24260 "EPSG:3005") at lwgeom_cache.c:520
#3  0x00007f24776517fd in LWGEOM_in (fcinfo=0x7ffe6199df80) at lwgeom_inout.c:150
#4  0x0000558056faab75 in DirectFunctionCall1Coll (func=0x7f2477651670 <LWGEOM_in>, collation=1475494496, arg1=0) at fmgr.c:803
#5  0x00007f247765257e in parse_WKT_lwgeom (fcinfo=0x558057eef4f0) at lwgeom_inout.c:662
#6  0x0000558056cc350d in ExecInterpExpr (state=0x558057eef410, econtext=0x558057eef570, isnull=<optimized out>) at execExprInterp.c:649
#7  0x0000558056d92a8e in ExecEvalExprSwitchContext (state=<optimized out>, econtext=0x558057f24260, isNull=0x7ffe6199e11f) at ../../../../src/include/executor/executor.h:307
#8  evaluate_expr (expr=<optimized out>, result_type=10775331, result_typmod=-1, result_collation=0) at clauses.c:4812
#9  0x0000558056d93664 in evaluate_function (funcid=10775691, result_type=10775331, result_typmod=-1, result_collid=0, input_collid=100, args=0x55805832d138, funcvariadic=<optimized out>, context=0x7ffe6199e660, func_tuple=<optimized out>)
    at clauses.c:4354
#10 simplify_function (funcid=10775691, result_type=10775331, result_typmod=-1, result_collid=0, input_collid=100, args_p=<optimized out>, funcvariadic=<optimized out>, process_args=<optimized out>, allow_non_const=true, context=0x7ffe6199e660)
    at clauses.c:3984
#11 0x0000558056d91864 in eval_const_expressions_mutator (node=0x558057e0a1b0, context=0x7ffe6199e660) at clauses.c:2477
#12 0x0000558056d26b06 in expression_tree_mutator (node=<optimized out>, mutator=0x558056d90c30 <eval_const_expressions_mutator>, context=0x7ffe6199e660) at nodeFuncs.c:3012
#13 0x0000558056d933e0 in simplify_function (funcid=10775573, result_type=25, result_typmod=-1, result_collid=100, input_collid=0, args_p=0x7ffe6199e470, funcvariadic=<optimized out>, process_args=<optimized out>, allow_non_const=<optimized out>, 
    context=0x7ffe6199e660) at clauses.c:3975
#14 0x0000558056d91864 in eval_const_expressions_mutator (node=0x55805832c5b8, context=0x7ffe6199e660) at clauses.c:2477
#15 0x0000558056d26829 in expression_tree_mutator (node=0x55805832c608, mutator=0x558056d90c30 <eval_const_expressions_mutator>, context=0x7ffe6199e660) at nodeFuncs.c:2762
#16 0x0000558056d90d4e in eval_const_expressions_mutator (node=0x55805832c608, context=0x7ffe6199e660) at clauses.c:3539
#17 0x0000558056d26b06 in expression_tree_mutator (node=<optimized out>, mutator=0x558056d90c30 <eval_const_expressions_mutator>, context=0x7ffe6199e660) at nodeFuncs.c:3012
#18 0x0000558056d90d4e in eval_const_expressions_mutator (node=0x558057e099d0, context=0x7ffe6199e660) at clauses.c:3539
#19 0x0000558056d90c10 in eval_const_expressions (root=<optimized out>, node=0x0) at clauses.c:2269
#20 0x0000558056d794ab in preprocess_expression (root=<optimized out>, expr=0x558057f24260, kind=1) at planner.c:1087
#21 subquery_planner (glob=<optimized out>, parse=0x558057e09800, parent_root=<optimized out>, hasRecursion=<optimized out>, tuple_fraction=0) at planner.c:769
#22 0x0000558056d78aaf in standard_planner (parse=0x558057e09800, cursorOptions=<optimized out>, boundParams=0x0) at planner.c:406
#23 0x0000558056e51e60 in pg_plan_query (querytree=0x558057e09800, cursorOptions=256, boundParams=0x0) at postgres.c:878
#24 pg_plan_queries (querytrees=<optimized out>, cursorOptions=256, boundParams=0x0) at postgres.c:968
#25 0x0000558056e562b4 in exec_simple_query (query_string=0x558057e08500 "SELECT 'cast2', ST_AsEWKT(st_asgeojson('SRID=3005;MULTIPOINT(1 1, 1 1)'::geometry)::geometry);") at postgres.c:1143
#26 0x0000558056e53d34 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:4243
#27 0x0000558056dc2ef7 in BackendRun (port=0x558057e2f5b0) at postmaster.c:4437
#28 0x0000558056dc24df in BackendStartup (port=<optimized out>) at postmaster.c:4128
#29 ServerLoop () at postmaster.c:1704
#30 0x0000558056dbf036 in PostmasterMain (argc=3, argv=0x558057e02230) at postmaster.c:1377
#31 0x0000558056d247d5 in main (argc=3, argv=0x558057e02230) at main.c:228
(gdb) f 0
#0  0x00007f24776bda8a in GetGenericCacheCollection (fcinfo=0x7ffe6199df80) at lwgeom_cache.c:79
79              GenericCacheCollection* cache = fcinfo->flinfo->fn_extra;
(gdb) p *fcinfo
$6 = {flinfo = 0x0, context = 0x0, resultinfo = 0x0, fncollation = 0, isnull = false, nargs = 1, args = 0x7ffe6199dfa0}

It seems that DirectFunctionCall1 needs to be replaced by CallerFInfoFunctionCall1 to ensure flinfo is initialized.

comment:8 by Raúl Marín <git@…>, 4 years ago

In 10c405f/git:

Revert "Internal cache now persists until the end of the transaction"

This reverts commit 3ced96fc1a79831a26c33311a17d1f32e3c5c732.

References #4674
Closes https://github.com/postgis/postgis/pull/560

comment:9 by Algunenano, 4 years ago

Pushed the revert.

The change to ST_AsGML to avoid the issue with the inline blocking the cache usage is still pending, so I'm keeping the issue open.

I'm thinking about changing the signature to also use LWGEOM_asGML and in there detect if the first parameter is an int or a geometry and act accordingly. I remember seeing something similar on some other function (iterating over the parameters) but I have to find where.

comment:10 by Raúl Marín <git@…>, 4 years ago

Resolution: fixed
Status: reopenedclosed

In 575ae858/git:

Change ST_AsGML(geometry) to use a C function direcly

By using a inline SQL, the cache that lives in the upper context was
deleted for each processed row (thus being ineffective)

Closes #4674

Note: See TracTickets for help on using tickets.