Changes between Version 16 and Version 17 of ImplementSortingMethodsBeforeGistIndexBuilding


Ignore:
Timestamp:
Aug 23, 2021, 6:08:56 AM (3 years ago)
Author:
HanwGeek
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • ImplementSortingMethodsBeforeGistIndexBuilding

    v16 v17  
    122122* Finish the documents
    123123
     124=== Final Report === 
     125**Abstract**
     126
     127GiST(Generalized Search Tree) is a generalization data structure of a variety of disk-based height-balanced search trees. Under the high-level API of GiST, structures like b-tree, r-tree can be implemented for data management. PostgreSQL defines a set of process function APIs for elements of the GiST index. Only with these function implementations can a data type be indexed and managed by a GiST structure. In large data scenarios, pre-sorting a batch of data fetched in memory may be a local approximation to the global sorting method. Recent PostgreSQL patch shows that it should speed up the build of a GiST index after some pre-sorting of the data which needs to be indexed. In one fork, the author replaces the GIST_OPTIONS_PROC with GIST_ORDER_PROC to try to define an order for data fetched in memory to sort in order to speed up the subsequent index building process. And I implemented pre-sorting methods in z-order pattern and Hilbert order pattern, Alos tested and compared pre-sorting methods on various data.
     128
     129**Links**
     130
     131* Code Base: https://github.com/postgis/postgis/pull/619
     132* Wiki: https://trac.osgeo.org/postgis/wiki/ImplementSortingMethodsBeforeGistIndexBuilding
     133* Test Results and Performance Comparison: https://docs.google.com/document/d/1_mY_F2hPDk3vmXH5PPp2z9BuQWt-ZMORk6KxtdVQ3HY/edit?usp=sharing
     134
     135**Conclusion**
     136
     137* The Morton/Hilbert hash function for pre-sorting can accelerate the process of index building process and reduce the time-consuming to one-third to one-fifth of the original
     138* The query operation for different data in the same area demonstrates more stable performance
     139* The query performance of pre-sorting is about 30% slower than the original
     140
     141**Future Work**
     142
     143* Try to figure out the reason for the slowness of the pre-sorting index
     144* Implement a fast Morton/Hilbert hash function for n-dimension geometry objects
     145
    124146== Student's Biography ==
    125147My name is Han WANG. I am a first year graduate student majoring in GIS at Peking University, and will get my Master's degree in 2023. And this is my github(https://github.com/HanwGeek) and my linkedin(https://www.linkedin.com/in/hanwgeek/). I am interested in all cool things. And it is very exciting to join the open source community! My research interest includes massive spatial temporal data management and analysis. Currently, I am working on a machine learning project based on big trajectory data, which is stored in PostgreSQL database and managed by PostGIS.