Changes between Version 12 and Version 13 of GSoC/2014/TestingFrameworkForGRASS


Ignore:
Timestamp:
May 24, 2014, 8:57:03 PM (10 years ago)
Author:
wenzeslaus
Comment:

dependencies, grass fatal, ms windows, locations, reports (mostly already discussed with Soeren)

Legend:

Unmodified
Added
Removed
Modified
  • GSoC/2014/TestingFrameworkForGRASS

    v12 v13  
    6161|| August 22 || Students can begin submitting required code samples to Google
    6262
    63 == Design of testing API ==
     63== Design of testing API and implementation ==
    6464
    6565{{{
     
    132132Compared to suggestion in ticket:2105#comment:4 it does not solve everything in `test_module` (`run_module`) function but it uses `self.assert*` similarly to `unittest.TestCase` which (syntactically) allows to check more then one thing.
    133133
     134=== Test script should be importable ===
     135
    134136Test scripts must have module/package character (as unittests requires for test discovery). This applies true for unittests and doctests, no exceptions. Doctests (inside normal module code or in separate doc) will be wrapped as `unittest` test cases (in `testsuite` directory). There is a [https://docs.python.org/2/library/doctest.html#unittest-api standard way] to do it. To have the possibility of import, all the GRASS Python libraries shouldn't do anything fancy at import time. For example, doctests currently don't work with `grass.script` unless you call [source:grass/trunk/gui/wxpython/core/toolboxes.py?rev=60218#L630 a set of functions] to deal with function `_` because of installing translate function as buildin `_` function while `_` is used also in `doctest`. (This is fixed for GUI but still applies to Python libraries).
    135137
    136 
    137 === Analyzing module run using valgrind or others ===
     138=== Dealing with process termination ===
     139
     140There is no easy way how to test that (GRASS) fatal errors are invoked when appropriate. Even if the test method (`test_*`) itself would run in separate process (not only the whole script) the process will be ended without proper reporting of the test result (considering we want detailed test results). However, since this applies also to fatal errors invoked by unintentional failure and to fatal errors, it seems that it will be necessary to invoke the test methods (`test_*`) in a separate process to at least finish the other tests and not break the final report. This might be done by function decorator so that we don't invoke new process for each function but only for those who need it (the ones using things which use `ctypes`).
     141
     142=== Analyzing module run using Valgrind or others ===
    138143
    139144Modules (or any tests?) can run with `valgrind` (probably `--tool=memcheck`). This could be done on the level of testing classes but the better option is to integrate this functionality (optional running with `valgrind`). Environmental variable (GRASS_PYGRASS_VALGRIND) or additional option `valgrind_=True` (similarly to overwrite) would invoke module with `valgrind` (works for both binaries and scripts). Additional options can be passed to `valgrind` using `valgrind`'s environmental variable `$VALGRIND_OPTS`. Output would be saved in file to not interfere with module output.
     
    142147
    143148PyGRASS or specialized PyGRASS module runner (function) in testing framework can have function, global variable, or environmental variable which would specify which tool should run a module (if any) and what are the parameters (besides the possibility to set parameters by environmental variable defined by the tool). The should ideally be separated from the module output and go to a file in the test output directory (and it could be later linked from/included into the main report).
     149
     150Having output from many modules can be confusing (e.g. we run `r.info` before actually running our module). It would be ideal if it would be possible to specify which modules called in the test should run with `valgrind` or other tool. API for this may, however, interfere with the API for global settings of running with these tools. It is not clear if `valgrind` would be applied even for library tests. This would require to run the testing process with `valgrind`. But since it needs to run separately anyway, this can be done.
     151
     152
     153=== Dependencies ===
     154
     155==== Dependencies on other tests ====
     156
     157The test runner needs to know if the dependencies are fulfilled, hence if the required modules and library tests were successful. So there must be a databases that keeps track of the test process. For example, if the raster library test fails, then all raster test will fail, such a case should be handled. The tests would need to specify the dependencies (there might be even more test dependencies then dependencies of the tested code).
     158
     159Alternatively, we can ignore dependency issues. We can just let all the tests fail if dependency failed (without us checking that dependency) and this would be it. By tracking dependencies you just save time and you make the result more clear. Fail of one test in the library, or one test of a module does not mean that the test using it was using the broken feature, so it can be still successful (e.g. failed test vector library 3D capabilities and module accessing just 2D geometries). Also not all tests of dependent code have to use that dependency (e.g. particular interpolation method).
     160
     161The simplest way to implement parallel dependency checking would be to have a file lock (e.g., [http://code.activestate.com/recipes/65203/ Cross-platform API for flock-style file locking]), so that only a single test runner has read and write access to the test status text file. Tests can run in parallel and have to wait until the file is unlocked. Consequently the test runner should not crash so that the file lock is always removed.
     162
     163Anyway, dependency checking may be challenging if we allow parallel testing. Not allowing parallel testing makes the test status database really simple, it's a text file that will be parsed by the test runner for each test script execution and extended with a new entry at the end of the test run. Maybe at least the library test shouldn't be executed in parallel (something might be in the make system already).
     164
     165Logs about the test state can be used to generate a simple test success/fail overview.
     166
     167
     168==== Dependencies of tested code ====
     169
     170Modules such as G7:r.in.lidar (depends on libLAS) or G7:v.buffer (depends on GEOS) are not build if the dependencies are not fulfilled. It might be good to have some special indication that the dependency is missing but this might be also leaved as task of test author who can implement special test function which will just check the presence of the module. Thus that the tests failed because of missing dependency would be visible in the test report.
     171
     172
     173== Reports from testing ==
     174
     175Everything should go (primarily) to files and directories with some defined structure. Something would have to gather information from files and build some main pages and summary pages. The advantage of having everything in files is that it might be more robust and that it can easily run in parallel. However, gathering of information afterwards can be challenging. Files are really the only option how to integrate valgrind outputs.
     176
     177
     178There is `TextTestRunner` in `unittest`, the implementation will start from there. For now, the testing framework will focus on HTML output. However, the goal is something like `GRASSTestRunner` which could do multiple outputs simultaneously (in the future) namely HTML, XML (there might be some reusable XML schemes for testing results) and TXT (might be enriched by some reStructuredText or Markdown or really plain). Some (simple) text (summary) should go in to standard output in parallel to output to files.
     179
     180It is not clear if the results should be organized by test functions (`test_*`) or only by test scripts (modules, test cases).
     181
     182Details to one test (not all have to be implemented):
     183
     184 * standard output and standard error output of tests
     185  * it might be hard to split if more than one module is called (same applies to functions)
     186 * Valgrind output or output by another tool used for running a modules in test
     187  * might be from one or more modules
     188 * the tested code
     189  * code itself with e.g., [http://pygments.org/ Pygments] or links to Doxygen documentation
     190  * it might be unclear what code to actually include (you can see names of modules, function, you know in which directory test suite was)
     191 * the testing code to see what exactly was tested and failed
     192 * pictures generated from maps for tests which were not successful (might be applied also to other types but this is really a bonus)
     193
     194Generally, the additional data can be linked or included directly (e.g. with some folding in HTML). This needs to be investigated.
     195
     196Each test (or whatever is generating output) will generate an output file which will be possible to include directly in the final HTML report (by link or by including it into some bigger file). Test runner which is not influenced by fatal errors and segmentation faults has to take care of the (HTML) file generation. The summary pages will be probably done by some reporter or finalizer script. The output of one standalone test script (which can be invoked by itself) will have (nice) usable output (this can or even should be reused in the main report).
    144197
    145198
     
    181234Test scripts will work when directly executed from command line and when executed from the make system. When tests will executed by make system they might be executed by a dedicated "test_runner" Python script to set up the environment. However, the environment can be set up also inside the test script and not setting the environment would be the default (or other way around since setting up a different environment would be safer).
    182235
     236To actually have separate processes is necessary in any case because only this makes testing framework robust enough to handle (GRASS) fatal error calls and segmentation faults.
     237
    183238Tests should be executable by itself (i.e. they should have `main()` function) to encourage running them often. This can be used by the framework itself rather then imports because it will simplify parallelization and outputs needs to go to files anyway (because of size) and we will collect everything from the files afterwards (so it does not matter if we will use process calls or imports).
     239
     240
    184241=== Example run ===
    185242
     
    195252    python ${test} >> /tmp/tgis_lib.txt 2>&1
    196253done
    197 }}} 
     254}}}
    198255
    199256The test output will be written to stdout and stderr (piped to a file in this case).
     
    205262}}}
    206263
     264or
     265
     266{{{
     267make test
     268}}}
     269
     270which might be more standard solution.
     271
     272
     273== Testing on MS Windows ==
     274
     275On Linux and all other unix-like systems we expect that the test will be done only when you also compile GRASS by yourself. This cannot be expected on MS Windows because of complexity of compilation and lack of MS Windows-based GRASS developers. Moreover, because of experience with different failures on different MS Windows computers (depending not only on system version but also system settings) we need to enable tests for as many users (computers) as possible.
     276
     277Invoking the test script on a MS Windows by hand and by make system should work. Test will be executed in the source tree in the same way as on Linux.
     278
     279 I hope that we can get to the state that users will be able to test GRASS. It is Python. We can use make system but also discovery (our or unittest's). The only problem I currently see is different layout of dirs in src and dist but it might not be an issue.
     280
     281Libraries are tested through ctypes, modules as programs, and the rest is mostly Python, so this should work in any case. However, there are several library tests that are executable programs (usually GRASS modules), for example in gmath, gpde, raster3d. These modules will be executed by testing framework inside testing functions (`test_*`). These modules are not compiled by default and are not part of the distribution. They need to be compiled in order to run the test. I guess we can compile additional modules and put them to one separate directory in distribution, or we can have debug distribution with testing framework and these modules, or we can create a similar system as we have for addons (on MS Windows). The modules could be compiled a prepared on server for download and they would be downloaded by testing framework or upon user request.
     282
    207283
    208284== Locations, mapsets and data ==
     
    214290We should have dedicated test locations with different projections and identical map names. I wouldn't use the GRASS sample locations (NC, Spearfish) as test locations directly. We should have dedicated test locations with selected data. They can overlap with (let's say) NC but may contain less imagery but on the other hand some additional strange data. The complication are doctests which are documentation, so as a consequence they should use the intersection of NC sample and testing location. The only difference between the locations would be the projection, so it really makes difference only for projected, latlon and perhaps XY.
    215291
     292All data should be in PERMANENT mapset. The reason is that on the fly generated temporary mapsets will have only access to the PERMANENT mapset by default. Access to other mapsets would have to be explicitly set. This might be the case when user wants to use his or her own mapset. On the other hand, it might be advantageous to have maps in different mapsets and just allow the access to all these mapsets. User would have to do the same and would have to keep the same mapset structure (which might not be so advantageous) which is just slightly more complex then keep the same map names (which user must do in any case).
     293
    216294If multiple locations are allowed and we expect some maps being in the location such as elevation raster, it is not clear how to actually test the result such as aspect computed from elevation since the result (such as MD5 sum) will be different for each location/projection. This would mean that the checking/assert functions or tests themselves would have to handle different locations and moreover, this type of tests would always fail in the user provided location.
    217295
     
    228306All reference files (and perhaps also additional data) will be located in the `testsuite` directory. There can be also one global directory with additional data (e.g. data to import) which will be shared between test suites and exposed by the testing framework.
    229307
     308The reference checking in case of different locations (projections) can only be solved in the test itself. The test author has to implement a conditioned reference check. Alternatively, a function (e.g., `def pick_the_right_reference(general_reference_name, location_name)`) could be implemented to help with getting the right reference file (or perhaps value) because some naming conventions for reference files will be introduced anyway.
     309
    230310Testing framework design should allow us to make different decisions about how to solve data and locations questions.
    231311
    232 == Weekly reports ==
     312Testing data will be available on server for download. The testing framework can download them if test is requested by user. The data can be saved in the user home directory and used next time. This may simplify things for users and also it will be clear for testing framework where to find testing data.
     313
     314
     315== GSoC weekly reports ==
    233316
    234317=== Week 01 ===