Changes between Version 17 and Version 18 of ImplementSortingMethodsBeforeGistIndexBuilding

Aug 23, 2021, 6:49:35 AM (3 months ago)



  • ImplementSortingMethodsBeforeGistIndexBuilding

    v17 v18  
    127127GiST(Generalized Search Tree) is a generalization data structure of a variety of disk-based height-balanced search trees. Under the high-level API of GiST, structures like b-tree, r-tree can be implemented for data management. PostgreSQL defines a set of process function APIs for elements of the GiST index. Only with these function implementations can a data type be indexed and managed by a GiST structure. In large data scenarios, pre-sorting a batch of data fetched in memory may be a local approximation to the global sorting method. Recent PostgreSQL patch shows that it should speed up the build of a GiST index after some pre-sorting of the data which needs to be indexed. In one fork, the author replaces the GIST_OPTIONS_PROC with GIST_ORDER_PROC to try to define an order for data fetched in memory to sort in order to speed up the subsequent index building process. And I implemented pre-sorting methods in z-order pattern and Hilbert order pattern, Alos tested and compared pre-sorting methods on various data.
     129**The state of the art BEFORE your GSoC**
     131The index building process does not change the tuple order in the page and run in a slow speed
     133**The addition value**
     134With the pre-sorting index, the time of building index reduce to the to one-third to one-fifth of the original
    144151* Implement a fast Morton/Hilbert hash function for n-dimension geometry objects
    146155== Student's Biography ==
    147156My name is Han WANG. I am a first year graduate student majoring in GIS at Peking University, and will get my Master's degree in 2023. And this is my github( and my linkedin( I am interested in all cool things. And it is very exciting to join the open source community! My research interest includes massive spatial temporal data management and analysis. Currently, I am working on a machine learning project based on big trajectory data, which is stored in PostgreSQL database and managed by PostGIS.