= GSoC 2021: Parallelization of existing modules for GRASS GIS = || Title: || '''Parallelization of existing modules for GRASS GIS''' || ||Student Name: || Aaron Saw Min Sern || ||Organization: || [http://www.osgeo.org OSGeo - Open Source Geospatial Foundation] || || Mentor Name: || Huidae Cho, Vaclav Petras, Māris Nartišs || || GSoC proposal: ||[https://summerofcode.withgoogle.com/dashboard/project/6280792767987712/details/ View proposal] || || !GitHub Repository: || [https://github.com/aaronsms View account] || == Abstract == The current state of OpenMP support for raster modules is limited. Many of such modules can benefit from parallelization. This project aims to provide parallelization to modules chosen based on their frequency of usage and its implementation complexity. == Timeline == {{{#!th style="background: #ddd" rowspan=2 '''Time Period''' }}} {{{#!th style="background: #ddd" colspan=2 '''Milestones''' }}} |----------------------- {{{#!th style="background: #ddd" Tasks }}} {{{#!th style="background: #ddd" Status }}} |----------------------- {{{#!td May 17th - June 7th\\ ''Community Bonding'' }}} {{{#!td - Introduce myself in dev and SOC mailing lists - Get in contact with mentors and discuss project - Prepare the wiki page - Set up the !GitHub repository for project - Set up developer environment }}} {{{#!td \\ Ok \\ Ok \\ Ok \\ Ok \\ Ok \\ \\ }}} |----------------------- {{{#!td June 7th - June 11th \\ ''Week 1'' }}} {{{#!td - Parallelize r.proj - Parallelize r.neighbor - Parallelize r.univar }}} {{{#!td \\ -- \\ Pending \\ Ok \\ \\ }}} |----------------------- {{{#!td June 14th - June 18th \\ ''Week 2'' }}} {{{#!td - Parallelize r.cross - Parallelize r.blend - Parallelize r.composite }}} {{{#!td }}} |----------------------- {{{#!td June 21st - June 25 \\ ''Week 3'' }}} {{{#!td - Parallelize r.mfilters - Parallelize r.covar - Parallelize r.texture }}} {{{#!td }}} |----------------------- {{{#!td June 28 - July 2 \\ ''Week 4'' }}} {{{#!td - Parallelize r.slope - Parallelize r.basins.fill - Parallelize r.sunhours - Parallelize r.flow }}} {{{#!td }}} |----------------------- {{{#!td July 5th - July 9th \\ ''Week 5'' }}} {{{#!td - Parallelize r.to.* }}} {{{#!td }}} |----------------------- {{{#!td style="background: #ddd" July 12th - July 16th \\ ''Week 6: Evaluations'' }}} {{{#!td style="background: #ddd" - Parallelize r.resamp.* }}} {{{#!td }}} |----------------------- {{{#!td July 19th - July 23rd \\ ''Week 7'' }}} {{{#!td - Parallelize r.resurf.* }}} {{{#!td }}} |----------------------- {{{#!td July 26th - July 30th \\ ''Week 8'' }}} {{{#!td - Parallelize r.random.cells - Parallelize r.random.surface - Implement support for users to specify number of threads }}} {{{#!td }}} |----------------------- {{{#!td August 2nd - August 6th \\ ''Week 9'' }}} {{{#!td - Parallelize r.li.* - Finish documentation and tutorials }}} {{{#!td }}} |----------------------- {{{#!td August 9th - August 13th \\ ''Week 10'' }}} {{{#!td - Finishing up, testing, documentation }}} {{{#!td }}} |----------------------- {{{#!td style="background: #ddd" August 16th- August 23rd \\ ''Week 11: Final Evaluation and Code Submission'' }}} {{{#!td style="background: #ddd" - Submit code and final evaluation }}} {{{#!td }}} == Bonding period report == '''1) What did I get done this period?'''\\ - I have set up a wiki page detailing my project and its progress. (1) - I have set up my development environment. Here's the link to my repository. (2) - I have gotten in touch with my mentors, and we are arranging a meeting this week. '''2) What do I plan on doing next week?'''\\ I will be working on parallelizing 3 modules: r.proj, r.neighbor, r.univar. Based on the results, I will adjust my plans in the future weeks. '''3) Am I blocked on anything?'''\\ No, it has been good so far. (1) [https://trac.osgeo.org/grass/wiki/GSoC/2021/RasterParallelization]\\ (2) [https://github.com/aaronsms/grass] == Weekly reports == === Week 1 === '''1) What did I get done this week?''' r.univar\\ - Updated Makefile to include OpenMP dependencies - Wrote multi-threaded test cases to ensure consistency of the program - Wrote benchmarking script to measure speedup - Implemented parallel support - Drafted the PR of abovementioned changes (1) r.neighbor\\ - Investigated Segment library to support random access and write operations '''2) What do I plan on doing next week?'''\\ The goal is to come out with a design for output-based modules. The next step is to finish the implementation for r.neighbor. Furthermore, I plan to investigate the thread-safety of Raster3D module for pthread implementation of r.mapcalc, which has known issues. (2) '''3) Am I blocked on anything?'''\\ No, it has been good so far, but I hope to improve on my pace. === Week 2 === '''1) What did I get done this week?''' r.univar\\ - Address changes for the PR (1) , e.g. to use a standard option "nprocs" will now be a parameter for users to indicate the number of threads r.neighbor\\ - Write test cases for parallel execution - Drafted a PR alongside its implementation (2) r.proj\\ - Write new test cases for the modules (3) '''2) What do I plan on doing next week?'''\\ I have managed to come up with a way to parallelize output-based modules like r.neighbor. The idea is to make use of a temporary segment file to allow threads to perform random write operations, which is not possible directly on compressed raster format file without using intermediate cache. With this design in mind, I intend to continue to parallelize similar modules next week. Also, there may be ideas in discussion to encapsulate a benchmarking framework possibly under grass.benchmark as this will be used repeatedly in the future to measure performance. '''3) Am I blocked on anything?'''\\ No, it has been good so far. (1) [https://github.com/OSGeo/grass/pull/1634]\\ (2) [https://github.com/OSGeo/grass/pull/1654]\\ (3) [https://github.com/OSGeo/grass/pull/1663]\\ (4) [https://github.com/OSGeo/grass/pull/1670]\\ === Week 3 === '''1) What did I get done this week?''' Upon discussion with the mentors, we have decided to explore alternative designs to using Segment library as intermediate output buffer. Specifically, there are two designs in mind, one which simply increases the size of the buffer but does sequential I/O to fill and output from the buffer with intermediate parallel computation, and a more complicated one which tries to eliminate having the threads to wait for the I/O. '''2) What do I plan on doing next week?'''\\ I plan to finalize the design by this week. (1) '''3) Am I blocked on anything?'''\\ No, it has been good so far. (1) https://github.com/aaronsms/grass === Week 4 === '''1) What did I get done this week?''' r.mfilter (PR: https://github.com/OSGeo/grass/pull/1708) - Add test cases for different input options (Sequential/Parallel filters, repeated, null_mode) - Add parallel implementations for all options excluding Sequential filters (inherently not possible to do parallelization '''2) What do I plan on doing next week?''' Upon discussion with the mentors, we decided to change the current implementation for r.neighbor that currently uses Segment libraries that uses a temporary file buffer for the different threads to work on before producing the raster file format. We realized that the Segment library does not fit the use cases enough to compensate for the overhead it might add. It was essentially used as an API to write to the file buffer, and we are not making good use of its caching capabilities. A native temporary file buffer should fit our use cases the most where the threads can write output simultaneously (which is the current implementation for r.mfilter). Next week, I aimed to make the necessary changes for r.neighbor and do proper benchmarking on large raster files to monitor the performance gain from parallelization (r.mfilter). '''3) Am I blocked on anything?''' No, it has been good so far. == Final report ==