Changes between Version 17 and Version 18 of ImplementSortingMethodsBeforeGistIndexBuilding


Ignore:
Timestamp:
Aug 23, 2021, 6:49:35 AM (3 years ago)
Author:
HanwGeek
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • ImplementSortingMethodsBeforeGistIndexBuilding

    v17 v18  
    127127GiST(Generalized Search Tree) is a generalization data structure of a variety of disk-based height-balanced search trees. Under the high-level API of GiST, structures like b-tree, r-tree can be implemented for data management. PostgreSQL defines a set of process function APIs for elements of the GiST index. Only with these function implementations can a data type be indexed and managed by a GiST structure. In large data scenarios, pre-sorting a batch of data fetched in memory may be a local approximation to the global sorting method. Recent PostgreSQL patch shows that it should speed up the build of a GiST index after some pre-sorting of the data which needs to be indexed. In one fork, the author replaces the GIST_OPTIONS_PROC with GIST_ORDER_PROC to try to define an order for data fetched in memory to sort in order to speed up the subsequent index building process. And I implemented pre-sorting methods in z-order pattern and Hilbert order pattern, Alos tested and compared pre-sorting methods on various data.
    128128
     129**The state of the art BEFORE your GSoC**
     130
     131The index building process does not change the tuple order in the page and run in a slow speed
     132
     133**The addition value**
     134With the pre-sorting index, the time of building index reduce to the to one-third to one-fifth of the original
     135
    129136**Links**
    130137
     
    144151* Implement a fast Morton/Hilbert hash function for n-dimension geometry objects
    145152
     153[[Image(https://user-images.githubusercontent.com/25524928/130458502-313360a1-01dd-46f0-8ca7-e9cf0147ee6c.png)]]
     154
    146155== Student's Biography ==
    147156My name is Han WANG. I am a first year graduate student majoring in GIS at Peking University, and will get my Master's degree in 2023. And this is my github(https://github.com/HanwGeek) and my linkedin(https://www.linkedin.com/in/hanwgeek/). I am interested in all cool things. And it is very exciting to join the open source community! My research interest includes massive spatial temporal data management and analysis. Currently, I am working on a machine learning project based on big trajectory data, which is stored in PostgreSQL database and managed by PostGIS.