| 124 | === Final Report === |
| 125 | **Abstract** |
| 126 | |
| 127 | GiST(Generalized Search Tree) is a generalization data structure of a variety of disk-based height-balanced search trees. Under the high-level API of GiST, structures like b-tree, r-tree can be implemented for data management. PostgreSQL defines a set of process function APIs for elements of the GiST index. Only with these function implementations can a data type be indexed and managed by a GiST structure. In large data scenarios, pre-sorting a batch of data fetched in memory may be a local approximation to the global sorting method. Recent PostgreSQL patch shows that it should speed up the build of a GiST index after some pre-sorting of the data which needs to be indexed. In one fork, the author replaces the GIST_OPTIONS_PROC with GIST_ORDER_PROC to try to define an order for data fetched in memory to sort in order to speed up the subsequent index building process. And I implemented pre-sorting methods in z-order pattern and Hilbert order pattern, Alos tested and compared pre-sorting methods on various data. |
| 128 | |
| 129 | **Links** |
| 130 | |
| 131 | * Code Base: https://github.com/postgis/postgis/pull/619 |
| 132 | * Wiki: https://trac.osgeo.org/postgis/wiki/ImplementSortingMethodsBeforeGistIndexBuilding |
| 133 | * Test Results and Performance Comparison: https://docs.google.com/document/d/1_mY_F2hPDk3vmXH5PPp2z9BuQWt-ZMORk6KxtdVQ3HY/edit?usp=sharing |
| 134 | |
| 135 | **Conclusion** |
| 136 | |
| 137 | * The Morton/Hilbert hash function for pre-sorting can accelerate the process of index building process and reduce the time-consuming to one-third to one-fifth of the original |
| 138 | * The query operation for different data in the same area demonstrates more stable performance |
| 139 | * The query performance of pre-sorting is about 30% slower than the original |
| 140 | |
| 141 | **Future Work** |
| 142 | |
| 143 | * Try to figure out the reason for the slowness of the pre-sorting index |
| 144 | * Implement a fast Morton/Hilbert hash function for n-dimension geometry objects |
| 145 | |