Opened 4 years ago

Closed 3 years ago

Last modified 3 years ago

#4775 closed defect (fixed)

[3.1] Performance regression on gserialized_overlaps_2d

Reported by: Algunenano Owned by: Algunenano
Priority: medium Milestone: PostGIS 3.1.0
Component: postgis Version: master
Keywords: Cc:

Description

While doing a perf check over 3.1 I've noticed a slowdown on some MVT queries (the only ones I'm running).

The underlying database is exactly the same, with the same query and plan (I just changed the .so library), and when I compare the perf output of both 3.0 and 3.1 I get this:

  • 3.0 (only the gserialized_overlaps_2d part)
-   11.76%     1.24%  postgres  postgis-3.so            [.] gserialized_overlaps_2d                                         ▒
   - 10.52% gserialized_overlaps_2d                                                                                         ▒
      - 9.96% gserialized_datum_get_box2df_p                                                                                ▒
         - 3.19% gserialized2_get_gbox_p                                                                                    ▒
            - 2.78% gserialized2_peek_gbox_p                                                                                ▒
                 1.25% gbox_float_round                                                                                     ▒
                 0.85% nextafterf32                                                                                         ▒
           1.78% __memmove_avx_unaligned_erms                                                                               ▒
         - 1.34% heap_tuple_untoast_attr                                                                                    ▒
            - 0.77% palloc                                                                                                  ▒
                 AllocSetAlloc                                                                                              ▒
           0.61% AllocSetFree                                                                                               ▒
   - 0.80% __libc_start_main                                                                                                ▒
        0x5569d1f94705                                                                                                      ▒
        PostmasterMain                                                                                                      ▒
        __restore_rt                                                                                                        ▒
        sigusr1_handler                                                                                                     ▒
        0x5569d202d837                                                                                                      ▒
        StartBackgroundWorker                                                                                               ▒
        ParallelWorkerMain                                                                                                  ▒
        ParallelQueryMain                                                                                                   ▒
        standard_ExecutorRun                                                                                                ▒
        ExecAgg                                                                                                             ▒
        fetch_input_tuple                                                                                                   ▒
      - ExecScan                                                                                                            ▒
         - 0.80% ExecInterpExpr                                                                                             ▒
              0.70% gserialized_overlaps_2

  • 3.1 (gserialized_overlaps_2d part):
   - 20.41% gserialized_overlaps_2d                                                                                         ▒
      - 20.01% gserialized_datum_predicate_2d                                                                               ▒
         - 15.83% gserialized_datum_get_internals_p                                                                         ▒
            - 4.42% gserialized2_get_gbox_p                                                                                 ▒
               - 2.50% gserialized2_peek_gbox_p                                                                             ▒
                    1.22% gbox_float_round                                                                                  ▒
                    0.79% nextafterf32                                                                                      ▒
                 1.32% gserialized2_read_gbox_p.llvm.13873022969809935523                                                   ▒
            - 4.31% heap_tuple_untoast_attr_slice                                                                           ▒
               - 1.85% palloc                                                                                               ▒
                    AllocSetAlloc                                                                                           ▒
              1.43% __memmove_avx_unaligned_erms                                                                            ▒
              1.09% AllocSetFree                                                                                            ▒
              0.68% heap_tuple_untoast_attr 

So it has gone from ~12% of the query to ~20% of the CPU time spent in the query.

I did tinker around these functions for #4676 so that's likely to be the culprit.

Change History (4)

comment:1 by Algunenano, 4 years ago

I've discovered 2 issues:

  • I changed gserialized_datum_predicate_2d to use use GBOX, so it does a double conversion: it gets the float box, creates a gbox using doubles and then downcasts it to float again.
  • PG_DETOAST_DATUM_SLICE always returns a copy of the datum even if it doesn't need to do anything with it (external, indirect, expanded, compressed), so for small geometries (untoasted) we are doing an extra copy and free that we didn't do before. I'm thinking on a more addecuate function for this case where we don't do anything to the datum if we don't need to, and only get a slice if we do because of how it was stored.

comment:2 by Algunenano, 4 years ago

I need to do more performance tests, but I think https://github.com/postgis/postgis/pull/586 should fix both those issues.

comment:3 by Raúl Marín <git@…>, 3 years ago

Resolution: fixed
Status: assignedclosed

In 4c7c065/git:

Be clever and only deserialize with copy when necessary

Closes #4775
Closes https://github.com/postgis/postgis/pull/586

comment:4 by Algunenano, 3 years ago

Regression fixed. All tiles are now at least as fast as with 3.0 (and some 30-40% faster).

Note: See TracTickets for help on using tickets.