= OpenCL Integration = == Introduction == GDAL and GRASS serve as a foundation for many open source projects. For example, GDAL's site lists 67 software packages which use it. Not included, of course, are people such as me who use GDAL and GRASS as stand-alone packages or in nondistributed software. A fundamental improvement in GDAL or GRASS will have a cascading improvement in hundreds of other uses. == Background == A process as fundamental as imagery resampling is such an improvement. In my experience, a major bottleneck turns out to be this resizing or warping of imagery. In a later section I'll describe in more detail a project where I created a several terabyte imagery dataset. Over half the CPU time was spent resampling pixels. Algorithm improvements can be just as important as increased efficiency. The basic 'nearest neighbor' resampling technique is insufficient for much GIS work. Nearest neighbor resampling can result in jagged lines in the best case, to what resembles image static in the worst case. To fix this, resampling algorithms such as 'bilinear', 'cubic convolution', 'spline', and 'Lanczos' have become popular. In fact, new resampling algorithms are listed at the bottom of the page containing ideas for improving GRASS's image processing (http://grass.osgeo.org/wiki/Image_processing). OpenCL is a GPU programming language (similar to C or OpenGL). It is used to write programs which can run on graphics cards (GPUs) and execute hundreds of threads simultaneously. An example of how this works: in image resampling, each pixel must be recalculated as a function of the surrounding pixels, and this must be done for millions (or billions) of pixels. In OpenCL, each thread calculates a pixel simultaneously and speedups in the range of 50 to 100 times can be expected in ideal situations. Advantages of OpenCL are that it should run on more devices and its broad industry support. For more information, see the OpenCL standards website. (http://www.khronos.org/opencl/) == The idea == I propose to first rewrite GDAL's advanced warping routines to use OpenCL. This alone should result in a massive speedup for many applications. The output of the OpenCL code and original code should be identical, so regression testing should be relatively easy. After testing, I then propose to integrate the new OpenCL code into the GRASS's r.proj. This would result in an increase in both the speed and number of resampling algorithms available to GRASS. Resampling built into the GRASS core (via get row functionality) will also be upgraded as appropriate. Optionally, other GRASS modules that I'm interested in optimizing are r.sun, r.resamp.interp, r.texture, and i.vi. Modules which are I/O bound will not be improved by the use of OpenCL, so I will test the module beforehand to make sure it is CPU bound and parallelizable. == Project plan == My work over the summer will follow this plan. Times listed are best guesses, but should be a reasonable target. Note that target timings give one pass through the "optional" steps over the summer. 1. Find an appropriate point to integrate OpenCL into GDAL (2 days) and make appropriate changes to the code base (5 days). 2. Get OpenCL code functioning in GDAL which duplicates existing GDAL functionality (but much faster; 20 days). 3. Test code in GDAL and fix any bugs (5 days). New and old output images should be identical. 4. Find an appropriate point to integrate the OpenCL code into the GRASS r.proj module. This will be complicated because as much of the raster as possible will need to be loaded into the GPU memory while being processed, which conflicts with current get-row functionality (2 days). 5. Copy the OpenCL warping code from GDAL to GRASS (5 days), bringing new resampling algorithms to GRASS. I expect the OpenCL code to be identical between GDAL and GRASS, making maintenance easier. The major differences will be the support code within GRASS. 6. Test code in GRASS, fix bugs, and discover any necessary customizations (4 days). 7. Test core GRASS resampling code, which is normally used via the get-row functionality. If necessary (it isn't I/O bound), I'll adapt the projection code to work with resampling here (5 days). Ideally, the same code can be used because the get-row resampling functionality is a sub-set of projection warping functionality. 8. Test OpenCL code against regular CPU output (3 days). 9. (optional) I'm unsure how much time the previous steps will take. At this point I should have plenty of OpenCL experience, though. I hope to also test other GRASS modules for potential speedup (2 days). I am particularly interested in r.sun, r.resamp.interp, r.resamp.stats, r.texture, r.slope.aspect, i.sunhours, and i.vi, but many of these may be found to be I/O bound. GDAL code will also be considered as appropriate (suggestions?). 10. (optional) Rewrite other module as the necessity and opportunity presents itself (15 days). 11. (optional) Test new module code (5 days). 12. GOTO 9 == Future ideas == In the future, OpenCL will be expandable throughout the codebase as needed. However, Knuth said: "Premature optimization is the root of all evil". Routines should only be rewritten if they have been profiled and found to be both non-I/O-bound and parallelizable. The ideas list (http://grass.osgeo.org/wiki/GRASS_SoC_Ideas) has multiple notes which say "implement multithreading as much as possible". Imagery resampling is only one place for OpenCL's multithreading, I suspect that many opportunities exist in. A list modules is developing on GRASS's GPU page (http://grass.osgeo.org/wiki/GPU). This OSGeo work integrates well into my own thesis work, so there is a strong chance that I'll be working on improving the modules I mentioned even after the Summer of Code is over.