wiki:GSoC/2021/RasterParallelization

Version 7 (modified by aaronsms, 3 years ago) ( diff )

--

GSoC 2021: Parallelization of existing modules for GRASS GIS

Title: Parallelization of existing modules for GRASS GIS
Student Name: Aaron Saw Min Sern
Organization: OSGeo - Open Source Geospatial Foundation
Mentor Name: Huidae Cho, Vaclav Petras, Māris Nartišs
GSoC proposal: View proposal
GitHub Repository: View account

Abstract

The current state of OpenMP support for raster modules is limited. Many of such modules can benefit from parallelization. This project aims to provide parallelization to modules chosen based on their frequency of usage and its implementation complexity.

Timeline

Time Period

Milestones

Tasks

Status

May 17th - June 7th
Community Bonding

  • Introduce myself in dev and SOC mailing lists
  • Get in contact with mentors and discuss project
  • Prepare the wiki page
  • Set up the GitHub repository for project
  • Set up developer environment


Ok
Ok
Ok
Ok
Ok

June 7th - June 11th
Week 1

  • Parallelize r.proj
  • Parallelize r.neighbor
  • Parallelize r.univar


--
Pending
Ok

June 14th - June 18th
Week 2

  • Parallelize r.cross
  • Parallelize r.blend
  • Parallelize r.composite

June 21st - June 25
Week 3

  • Parallelize r.mfilters
  • Parallelize r.covar
  • Parallelize r.texture

June 28 - July 2
Week 4

  • Parallelize r.slope
  • Parallelize r.basins.fill
  • Parallelize r.sunhours
  • Parallelize r.flow

July 5th - July 9th
Week 5

  • Parallelize r.to.*

July 12th - July 16th
Week 6: Evaluations

  • Parallelize r.resamp.*

July 19th - July 23rd
Week 7

  • Parallelize r.resurf.*

July 26th - July 30th
Week 8

  • Parallelize r.random.cells
  • Parallelize r.random.surface
  • Implement support for users to specify number of threads

August 2nd - August 6th
Week 9

  • Parallelize r.li.*
  • Finish documentation and tutorials

August 9th - August 13th
Week 10

  • Finishing up, testing, documentation

August 16th- August 23rd
Week 11: Final Evaluation and Code Submission

  • Submit code and final evaluation

Bonding period report

1) What did I get done this period?

  • I have set up a wiki page detailing my project and its progress. (1)
  • I have set up my development environment. Here's the link to my repository. (2)
  • I have gotten in touch with my mentors, and we are arranging a meeting this week.

2) What do I plan on doing next week?

I will be working on parallelizing 3 modules: r.proj, r.neighbor, r.univar. Based on the results, I will adjust my plans in the future weeks.

3) Am I blocked on anything?

No, it has been good so far.

(1) https://trac.osgeo.org/grass/wiki/GSoC/2021/RasterParallelization
(2) https://github.com/aaronsms/grass

Weekly reports

Week 1

1) What did I get done this week?

r.univar

  • Updated Makefile to include OpenMP dependencies
  • Wrote multi-threaded test cases to ensure consistency of the program
  • Wrote benchmarking script to measure speedup
  • Implemented parallel support
  • Drafted the PR of abovementioned changes (1)

r.neighbor

  • Investigated Segment library to support random access and write operations

2) What do I plan on doing next week?

The goal is to come out with a design for output-based modules. The next step is to finish the implementation for r.neighbor. Furthermore, I plan to investigate the thread-safety of Raster3D module for pthread implementation of r.mapcalc, which has known issues. (2)

3) Am I blocked on anything?

No, it has been good so far, but I hope to improve on my pace.

Week 2

1) What did I get done this week?

r.univar

  • Address changes for the PR (1) , e.g. to use a standard option "nprocs" will now be a parameter for users to indicate the number of threads

r.neighbor

  • Write test cases for parallel execution
  • Drafted a PR alongside its implementation (2)

r.proj

  • Write new test cases for the modules (3)

2) What do I plan on doing next week?

I have managed to come up with a way to parallelize output-based modules like r.neighbor. The idea is to make use of a temporary segment file to allow threads to perform random write operations, which is not possible directly on compressed raster format file without using intermediate cache. With this design in mind, I intend to continue to parallelize similar modules next week. Also, there may be ideas in discussion to encapsulate a benchmarking framework possibly under grass.benchmark as this will be used repeatedly in the future to measure performance.

3) Am I blocked on anything?

No, it has been good so far.

(1) https://github.com/OSGeo/grass/pull/1634
(2) https://github.com/OSGeo/grass/pull/1654
(3) https://github.com/OSGeo/grass/pull/1663
(4) https://github.com/OSGeo/grass/pull/1670

Week 3

1) What did I get done this week?

Upon discussion with the mentors, we have decided to explore alternative designs to using Segment library as intermediate output buffer. Specifically, there are two designs in mind, one which simply increases the size of the buffer but does sequential I/O to fill and output from the buffer with intermediate parallel computation, and a more complicated one which tries to eliminate having the threads to wait for the I/O.

2) What do I plan on doing next week?

I plan to finalize the design by this week. (1)

3) Am I blocked on anything?

No, it has been good so far.

(1) https://github.com/aaronsms/grass

Week 4

1) What did I get done this week?

r.mfilter (PR: https://github.com/OSGeo/grass/pull/1708)

  • Add test cases for different input options (Sequential/Parallel filters, repeated, null_mode)
  • Add parallel implementations for all options excluding Sequential filters (inherently not possible to do parallelization

2) What do I plan on doing next week?

Upon discussion with the mentors, we decided to change the current implementation for r.neighbor that currently uses Segment libraries that uses a temporary file buffer for the different threads to work on before producing the raster file format. We realized that the Segment library does not fit the use cases enough to compensate for the overhead it might add. It was essentially used as an API to write to the file buffer, and we are not making good use of its caching capabilities. A native temporary file buffer should fit our use cases the most where the threads can write output simultaneously (which is the current implementation for r.mfilter).

Next week, I aimed to make the necessary changes for r.neighbor and do proper benchmarking on large raster files to monitor the performance gain from parallelization (r.mfilter).

3) Am I blocked on anything?

No, it has been good so far.

Final report

Note: See TracWiki for help on using the wiki.