= GSoC 2021: Parallelization of existing modules for GRASS GIS = || Title: || '''Parallelization of existing modules for GRASS GIS''' || ||Student Name: || Aaron Saw Min Sern || ||Organization: || [http://www.osgeo.org OSGeo - Open Source Geospatial Foundation] || || Mentor Name: || Huidae Cho, Vaclav Petras, Māris Nartišs || || GSoC proposal: ||[https://summerofcode.withgoogle.com/dashboard/project/6280792767987712/details/ View proposal] || || !GitHub Repository: || [https://github.com/aaronsms View account] || == Abstract == The current state of OpenMP support for raster modules is limited. Many of such modules can benefit from parallelization. This project aims to provide parallelization to modules chosen based on their frequency of usage and its implementation complexity. == Timeline == {{{#!th style="background: #ddd" rowspan=2 '''Time Period''' }}} {{{#!th style="background: #ddd" colspan=2 '''Milestones''' }}} |----------------------- {{{#!th style="background: #ddd" Tasks }}} {{{#!th style="background: #ddd" Status }}} |----------------------- {{{#!td May 17th - June 7th\\ ''Community Bonding'' }}} {{{#!td - Introduce myself in dev and SOC mailing lists - Get in contact with mentors and discuss project - Prepare the wiki page - Set up the !GitHub repository for project - Set up developer environment }}} {{{#!td \\ Ok \\ Ok \\ Ok \\ Ok \\ Ok \\ \\ }}} |----------------------- {{{#!td June 7th - June 11th \\ ''Week 1'' }}} {{{#!td - Parallelize r.proj - Parallelize r.neighbor - Parallelize r.univar }}} {{{#!td \\ Parallelized r.univar\\ Attempted r.proj \\ }}} |----------------------- {{{#!td June 14th - June 18th \\ ''Week 2'' }}} {{{#!td - Parallelize r.cross - Parallelize r.blend - Parallelize r.composite }}} {{{#!td \\ Parallelized r.neighbors V1\\ \\ }}} |----------------------- {{{#!td June 21st - June 25 \\ ''Week 3'' }}} {{{#!td - Parallelize r.mfilters - Parallelize r.covar - Parallelize r.texture }}} {{{#!td Parallelized r.neighbors V2\\ }}} |----------------------- {{{#!td June 28 - July 2 \\ ''Week 4'' }}} {{{#!td - Parallelize r.slope - Parallelize r.basins.fill - Parallelize r.sunhours - Parallelize r.flow }}} {{{#!td Parallelized r.mfilter }}} |----------------------- {{{#!td July 5th - July 9th \\ ''Week 5'' }}} {{{#!td - Parallelize r.to.* }}} {{{#!td Add benchmark support }}} |----------------------- {{{#!td style="background: #ddd" July 12th - July 16th \\ ''Week 6: Evaluations'' }}} {{{#!td style="background: #ddd" - Parallelize r.resamp.* }}} {{{#!td Reworked r.neighbors V3 }}} |----------------------- {{{#!td July 19th - July 23rd \\ ''Week 7'' }}} {{{#!td - Parallelize r.resurf.* }}} {{{#!td Parallelized r.resamo,filter }}} |----------------------- {{{#!td July 26th - July 30th \\ ''Week 8'' }}} {{{#!td - Parallelize r.random.cells - Parallelize r.random.surface - Implement support for users to specify number of threads }}} {{{#!td Parallelized r.resamo,interp\\ Parallelized r.slope.aspect }}} |----------------------- {{{#!td August 2nd - August 6th \\ ''Week 9'' }}} {{{#!td - Parallelize r.li.* - Finish documentation and tutorials }}} {{{#!td Refactored r,univar\\ Parallelized r.series }}} |----------------------- {{{#!td August 9th - August 13th \\ ''Week 10'' }}} {{{#!td - Finishing up, testing, documentation }}} {{{#!td Parallelized r.patch\\ Documentation }}} |----------------------- {{{#!td style="background: #ddd" August 16th- August 23rd \\ ''Week 11: Final Evaluation and Code Submission'' }}} {{{#!td style="background: #ddd" - Submit code and final evaluation }}} {{{#!td Ok }}} == Final report == Of all the modules that are proposed to be parallelized, I have introduced OpenMP support to 8 raster modules: - r.univar - https://github.com/OSGeo/grass/pull/1634 - r.neighbors - https://github.com/OSGeo/grass/pull/1724 - r.mfilter - https://github.com/OSGeo/grass/pull/1708 - r.resamp.filter - https://github.com/OSGeo/grass/pull/1759 - r.resamp.interp - https://github.com/OSGeo/grass/pull/1771 - r.slope.aspect - https://github.com/OSGeo/grass/pull/1767 - r.series - https://github.com/OSGeo/grass/pull/1776 - r.patch - https://github.com/OSGeo/grass/pull/1782 Firstly, I have greatly underestimated the complexity of the work. Up to 20 modules were initially proposed at first but after the second week. However, it became clear that we had to cut down on the number of target modules and focus more on designing the algorithms. The modules we targeted behave differently as compared to some modules that had received OpenMP support in the past such as r.sun. In particular, the modules need to keep the same of behavior of having low memory footprint even after the parallelization, unlike r.sun which loads the entire raster map in-memory. During the first half of the GSoC, with the mentors’ discussion, we have came out with three different approaches to introducing parallel support to r.neighbors. After benchmarking their performance and taking account of their memory/disk usage, we decided to settle with the last approach which requires us to add an extra parameter memory to allow users to adjust their memory consumption. With this approach, we have to allow the modules process the raster map by chunks. Once we settled about the design, we started applying the same approach to other similar modules with low memory footprints. For more information regarding the implementation, see Raster Parallelization with OpenMP. Furthermore, test scripts were included in the modules to ensure the consistency of the results. Benchmark scripts were added to allow users to easily benchmark the performance of the parallelization to monitor the speedup in their own local machine. User documentation were also modified to include sections detailing how to make use of the newly added features. In the future, more raster modules can be parallelized using similar approach. Then, we can consider tackling more complex modules such as r.watershed and r.mapcalc. Also, we could consider exploring 3D raster modules as well. Furthermore, when we implement parallelization for r.univar, we notice that modules that produce statistics involving arithmetics can often have floating point discrepancies when dealing with large summation. Because of this, computation using different number of threads will now produce different results due to having different order of arithmetics. One idea would be to introduce Kahan summation algorithm to reduce the floating point discrepancies. However, this still would not guarantee the consistency of results. GSoC Submission - https://aaronsms.github.io/gsoc/2021.html GSoC Project Dashboard - https://summerofcode.withgoogle.com/dashboard/project/6280792767987712/overview/ == Bonding period report == '''1) What did I get done this period?'''\\ - I have set up a wiki page detailing my project and its progress. (1) - I have set up my development environment. Here's the link to my repository. (2) - I have gotten in touch with my mentors, and we are arranging a meeting this week. '''2) What do I plan on doing next week?'''\\ I will be working on parallelizing 3 modules: r.proj, r.neighbor, r.univar. Based on the results, I will adjust my plans in the future weeks. '''3) Am I blocked on anything?'''\\ No, it has been good so far. (1) [https://trac.osgeo.org/grass/wiki/GSoC/2021/RasterParallelization]\\ (2) [https://github.com/aaronsms/grass] == Weekly reports == === Week 1 === '''1) What did I get done this week?''' r.univar\\ - Updated Makefile to include OpenMP dependencies - Wrote multi-threaded test cases to ensure consistency of the program - Wrote benchmarking script to measure speedup - Implemented parallel support - Drafted the PR of abovementioned changes (1) r.neighbor\\ - Investigated Segment library to support random access and write operations '''2) What do I plan on doing next week?'''\\ The goal is to come out with a design for output-based modules. The next step is to finish the implementation for r.neighbor. Furthermore, I plan to investigate the thread-safety of Raster3D module for pthread implementation of r.mapcalc, which has known issues. (2) '''3) Am I blocked on anything?'''\\ No, it has been good so far, but I hope to improve on my pace. === Week 2 === '''1) What did I get done this week?''' r.univar\\ - Address changes for the PR (1) , e.g. to use a standard option "nprocs" will now be a parameter for users to indicate the number of threads r.neighbor\\ - Write test cases for parallel execution - Drafted a PR alongside its implementation (2) r.proj\\ - Write new test cases for the modules (3) '''2) What do I plan on doing next week?'''\\ I have managed to come up with a way to parallelize output-based modules like r.neighbor. The idea is to make use of a temporary segment file to allow threads to perform random write operations, which is not possible directly on compressed raster format file without using intermediate cache. With this design in mind, I intend to continue to parallelize similar modules next week. Also, there may be ideas in discussion to encapsulate a benchmarking framework possibly under grass.benchmark as this will be used repeatedly in the future to measure performance. '''3) Am I blocked on anything?'''\\ No, it has been good so far. (1) [https://github.com/OSGeo/grass/pull/1634]\\ (2) [https://github.com/OSGeo/grass/pull/1654]\\ (3) [https://github.com/OSGeo/grass/pull/1663]\\ (4) [https://github.com/OSGeo/grass/pull/1670]\\ === Week 3 === '''1) What did I get done this week?''' Upon discussion with the mentors, we have decided to explore alternative designs to using Segment library as intermediate output buffer. Specifically, there are two designs in mind, one which simply increases the size of the buffer but does sequential I/O to fill and output from the buffer with intermediate parallel computation, and a more complicated one which tries to eliminate having the threads to wait for the I/O. '''2) What do I plan on doing next week?'''\\ I plan to finalize the design by this week. (1) '''3) Am I blocked on anything?'''\\ No, it has been good so far. (1) https://github.com/aaronsms/grass === Week 4 === '''1) What did I get done this week?''' r.mfilter (PR: https://github.com/OSGeo/grass/pull/1708) - Add test cases for different input options (Sequential/Parallel filters, repeated, null_mode) - Add parallel implementations for all options excluding Sequential filters (inherently not possible to do parallelization '''2) What do I plan on doing next week?''' Upon discussion with the mentors, we decided to change the current implementation for r.neighbor that currently uses Segment libraries that uses a temporary file buffer for the different threads to work on before producing the raster file format. We realized that the Segment library does not fit the use cases enough to compensate for the overhead it might add. It was essentially used as an API to write to the file buffer, and we are not making good use of its caching capabilities. A native temporary file buffer should fit our use cases the most where the threads can write output simultaneously (which is the current implementation for r.mfilter). Next week, I aimed to make the necessary changes for r.neighbor and do proper benchmarking on large raster files to monitor the performance gain from parallelization (r.mfilter). '''3) Am I blocked on anything?''' No, it has been good so far. === Week 5 === '''1) What did I get done this week?''' To benchmark both r.mfilter and r.neighbor implementation, I have made use of the recently merged benchmark library on randomly generated raster using r.surf.fractal. The preliminary result is as follows for both modules (y-axis - time/secs, x-axis - nprocs | benchmarked on my local workstation): Furthermore, checks are done to compare between performance on master branch vs after implementation (nprocs = 1), and the results are comparable. These two implementations make use of extensive disk I/O to write to temporary file buffer before transferring to the final raster file format. This behavior is default in r.mfilter, but is explicitly introduced in r.neighbors to allow for parallelization. Upon discussion with the mentors, we decided that we should make better use of memory over disk. Ideally, the user will be able to input the size of memory usage to be used for buffer. However, r.mfilter will still preserve its original usage of temporary files buffer. '''2) What do I plan on doing next week?''' - Complete rework of r.neighbors implementation - Compare benchmark between the two implementations '''3) Am I blocked on anything?''' No major roadblock, but I need to catch up a bit to rework my r.neighbor implementation. === Week 6 === '''1) What did I get done this week?''' r.neighbors The main goal that I have accomplished is to do a complete rework of the r.neighbors implementation (PR: [https://github.com/OSGeo/grass/pull/1724]). A benchmark script is ready under 'benchmark' directory for users to test the performance on their local machine. The performance is comparable to the previous implementation that make use of temporary files as buffer (on SSD) instead of memory. The result of the benchmarking on my local machine (12 cores) under the PR. r.mfilter There are issues pointed out when working on raster files > 2GB (PR: [https://github.com/OSGeo/grass/pull/1708]). This is promptly addressed with commit (4caa96), and the cause is due to overflow from multiplication. This PR is ready, and a benchmark script is provided as well for local benchmarking. '''2) What do I plan on doing next week?''' - Introduce an environment variable that overwrites the default nprocs parameter which is currently 1. This is so that the users do not need to add nprocs parameter explicitly. - Implement r.resamp.filter/r.resamp.interp parallelization '''3) Am I blocked on anything?''' No major issues. === Week 7 === '''1) What did I get done this week?''' - Introduce an environment variable that overwrites the default nprocs parameter which is currently 1. This is so that the users do not need to add nprocs parameter explicitly. r.resamp.filter - Implement parallelization - Add test cases '''2) What do I plan on doing next week?''' - Implement parallelization for r.slope.aspect with testing and benchmarking '''3) Am I blocked on anything?''' No major issues. === Week 8 === '''1) What did I get done this week?''' r.resamp.interp [https://github.com/OSGeo/grass/pull/1771] - Implement parallelization r.slope.aspect [https://github.com/OSGeo/grass/pull/1767] - Implement parallelization Both implementation above follows similarly to r.neighbor [https://github.com/OSGeo/grass/pull/1724]. r.slope.aspect keeps track of global statistics variable like min/max, thus additional variable reduction is required aside from map computation. The benchmarking of the modules will be supplemented in the PR. '''2) What do I plan on doing next week?''' - Refactor r.univar - Implement parallelization for r.series, r.patch - Revisit r.proj to decide on implementation '''3) Am I blocked on anything?''' No major issues. === Week 9 === '''1) What did I get done this week?''' r.univar [https://github.com/OSGeo/grass/pull/1634] - Refactor previous implementation r.series [https://github.com/OSGeo/grass/pull/1776] - Implement parallelization Implementation for r.patch is yet to be completed. '''2) What do I plan on doing next week?''' - Finish implementing r.patch parallelization - Write documentation on manual pages for each of the modules that have been implemented - Specifically, a section titled "Performance" to include user parameters for parallel processing and expected behavior and issues. - r.univar - r.mfilter - r.neighbors - r.slope.aspect - r.resamp.filter - r.resamp.interp - r.series - r.patch - Include a wiki page on the general OpenMP implementation and the benchmark results of each module '''3) Am I blocked on anything?''' No major issues. === Week 10 === '''1) What did I get done this week?''' - Finished implementing r.patch parallelization [https://github.com/OSGeo/grass/pull/1782] - Included a wiki page on the general OpenMP implementation [https://grasswiki.osgeo.org/wiki/Raster_Parallelization_with_OpenMP] - Wrote documentation on manual pages for some modules (Performance section) - Added benchmark scripts to all implemented modules '''2) What do I plan on doing next week and beyond?''' Most of the modules will not be merged into master by next week. There are still some checks to be done, like benchmarking and tests on large raster maps. I intend to prepare for the GSoC submission for the next week and continue to work to smoothly prepare the PRs to be merged for the upcoming release.