Context Navigation

TestingFrameworkForGRASS

Timestamp:: May 24, 2014, 8:57:03 PM (10 years ago)
Author:: wenzeslaus
Comment:: dependencies, grass fatal, ms windows, locations, reports (mostly already discussed with Soeren)

Legend:

: Unmodified
: Added
: Removed
: Modified

GSoC/2014/TestingFrameworkForGRASS

-              v12
+              v13
 || August 22 || Students can begin submitting required code samples to Google
 == Design of testing API ==
+== Design of testing API and implementation ==
 {{{
 …
 Compared to suggestion in ticket:2105#comment:4 it does not solve everything in `test_module` (`run_module`) function but it uses `self.assert*` similarly to `unittest.TestCase` which (syntactically) allows to check more then one thing.
+=== Test script should be importable ===
 Test scripts must have module/package character (as unittests requires for test discovery). This applies true for unittests and doctests, no exceptions. Doctests (inside normal module code or in separate doc) will be wrapped as `unittest` test cases (in `testsuite` directory). There is a [https://docs.python.org/2/library/doctest.html#unittest-api standard way] to do it. To have the possibility of import, all the GRASS Python libraries shouldn't do anything fancy at import time. For example, doctests currently don't work with `grass.script` unless you call [source:grass/trunk/gui/wxpython/core/toolboxes.py?rev=60218#L630 a set of functions] to deal with function `_` because of installing translate function as buildin `_` function while `_` is used also in `doctest`. (This is fixed for GUI but still applies to Python libraries).
+=== Analyzing module run using valgrind or others ===
+=== Dealing with process termination ===
+There is no easy way how to test that (GRASS) fatal errors are invoked when appropriate. Even if the test method (`test_*`) itself would run in separate process (not only the whole script) the process will be ended without proper reporting of the test result (considering we want detailed test results). However, since this applies also to fatal errors invoked by unintentional failure and to fatal errors, it seems that it will be necessary to invoke the test methods (`test_*`) in a separate process to at least finish the other tests and not break the final report. This might be done by function decorator so that we don't invoke new process for each function but only for those who need it (the ones using things which use `ctypes`).
+=== Analyzing module run using Valgrind or others ===
 Modules (or any tests?) can run with `valgrind` (probably `--tool=memcheck`). This could be done on the level of testing classes but the better option is to integrate this functionality (optional running with `valgrind`). Environmental variable (GRASS_PYGRASS_VALGRIND) or additional option `valgrind_=True` (similarly to overwrite) would invoke module with `valgrind` (works for both binaries and scripts). Additional options can be passed to `valgrind` using `valgrind`'s environmental variable `$VALGRIND_OPTS`. Output would be saved in file to not interfere with module output.
 …
 PyGRASS or specialized PyGRASS module runner (function) in testing framework can have function, global variable, or environmental variable which would specify which tool should run a module (if any) and what are the parameters (besides the possibility to set parameters by environmental variable defined by the tool). The should ideally be separated from the module output and go to a file in the test output directory (and it could be later linked from/included into the main report).
+Having output from many modules can be confusing (e.g. we run `r.info` before actually running our module). It would be ideal if it would be possible to specify which modules called in the test should run with `valgrind` or other tool. API for this may, however, interfere with the API for global settings of running with these tools. It is not clear if `valgrind` would be applied even for library tests. This would require to run the testing process with `valgrind`. But since it needs to run separately anyway, this can be done.
+=== Dependencies ===
+==== Dependencies on other tests ====
+The test runner needs to know if the dependencies are fulfilled, hence if the required modules and library tests were successful. So there must be a databases that keeps track of the test process. For example, if the raster library test fails, then all raster test will fail, such a case should be handled. The tests would need to specify the dependencies (there might be even more test dependencies then dependencies of the tested code).
+Alternatively, we can ignore dependency issues. We can just let all the tests fail if dependency failed (without us checking that dependency) and this would be it. By tracking dependencies you just save time and you make the result more clear. Fail of one test in the library, or one test of a module does not mean that the test using it was using the broken feature, so it can be still successful (e.g. failed test vector library 3D capabilities and module accessing just 2D geometries). Also not all tests of dependent code have to use that dependency (e.g. particular interpolation method).
+The simplest way to implement parallel dependency checking would be to have a file lock (e.g., [http://code.activestate.com/recipes/65203/ Cross-platform API for flock-style file locking]), so that only a single test runner has read and write access to the test status text file. Tests can run in parallel and have to wait until the file is unlocked. Consequently the test runner should not crash so that the file lock is always removed.
+Anyway, dependency checking may be challenging if we allow parallel testing. Not allowing parallel testing makes the test status database really simple, it's a text file that will be parsed by the test runner for each test script execution and extended with a new entry at the end of the test run. Maybe at least the library test shouldn't be executed in parallel (something might be in the make system already).
+Logs about the test state can be used to generate a simple test success/fail overview.
+==== Dependencies of tested code ====
+Modules such as G7:r.in.lidar (depends on libLAS) or G7:v.buffer (depends on GEOS) are not build if the dependencies are not fulfilled. It might be good to have some special indication that the dependency is missing but this might be also leaved as task of test author who can implement special test function which will just check the presence of the module. Thus that the tests failed because of missing dependency would be visible in the test report.
+== Reports from testing ==
+Everything should go (primarily) to files and directories with some defined structure. Something would have to gather information from files and build some main pages and summary pages. The advantage of having everything in files is that it might be more robust and that it can easily run in parallel. However, gathering of information afterwards can be challenging. Files are really the only option how to integrate valgrind outputs.
+There is `TextTestRunner` in `unittest`, the implementation will start from there. For now, the testing framework will focus on HTML output. However, the goal is something like `GRASSTestRunner` which could do multiple outputs simultaneously (in the future) namely HTML, XML (there might be some reusable XML schemes for testing results) and TXT (might be enriched by some reStructuredText or Markdown or really plain). Some (simple) text (summary) should go in to standard output in parallel to output to files.
+It is not clear if the results should be organized by test functions (`test_*`) or only by test scripts (modules, test cases).
+Details to one test (not all have to be implemented):
+ * standard output and standard error output of tests
+  * it might be hard to split if more than one module is called (same applies to functions)
+ * Valgrind output or output by another tool used for running a modules in test
+  * might be from one or more modules
+ * the tested code
+  * code itself with e.g., [http://pygments.org/ Pygments] or links to Doxygen documentation
+  * it might be unclear what code to actually include (you can see names of modules, function, you know in which directory test suite was)
+ * the testing code to see what exactly was tested and failed
+ * pictures generated from maps for tests which were not successful (might be applied also to other types but this is really a bonus)
+Generally, the additional data can be linked or included directly (e.g. with some folding in HTML). This needs to be investigated.
+Each test (or whatever is generating output) will generate an output file which will be possible to include directly in the final HTML report (by link or by including it into some bigger file). Test runner which is not influenced by fatal errors and segmentation faults has to take care of the (HTML) file generation. The summary pages will be probably done by some reporter or finalizer script. The output of one standalone test script (which can be invoked by itself) will have (nice) usable output (this can or even should be reused in the main report).
 …
 Test scripts will work when directly executed from command line and when executed from the make system. When tests will executed by make system they might be executed by a dedicated "test_runner" Python script to set up the environment. However, the environment can be set up also inside the test script and not setting the environment would be the default (or other way around since setting up a different environment would be safer).
+To actually have separate processes is necessary in any case because only this makes testing framework robust enough to handle (GRASS) fatal error calls and segmentation faults.
 Tests should be executable by itself (i.e. they should have `main()` function) to encourage running them often. This can be used by the framework itself rather then imports because it will simplify parallelization and outputs needs to go to files anyway (because of size) and we will collect everything from the files afterwards (so it does not matter if we will use process calls or imports).
 === Example run ===
 …
     python ${test} >> /tmp/tgis_lib.txt 2>&1
 done
 }}}
+}}}
 The test output will be written to stdout and stderr (piped to a file in this case).
 …
 }}}
+or
+{{{
+make test
+}}}
+which might be more standard solution.
+== Testing on MS Windows ==
+On Linux and all other unix-like systems we expect that the test will be done only when you also compile GRASS by yourself. This cannot be expected on MS Windows because of complexity of compilation and lack of MS Windows-based GRASS developers. Moreover, because of experience with different failures on different MS Windows computers (depending not only on system version but also system settings) we need to enable tests for as many users (computers) as possible.
+Invoking the test script on a MS Windows by hand and by make system should work. Test will be executed in the source tree in the same way as on Linux.
+ I hope that we can get to the state that users will be able to test GRASS. It is Python. We can use make system but also discovery (our or unittest's). The only problem I currently see is different layout of dirs in src and dist but it might not be an issue.
+Libraries are tested through ctypes, modules as programs, and the rest is mostly Python, so this should work in any case. However, there are several library tests that are executable programs (usually GRASS modules), for example in gmath, gpde, raster3d. These modules will be executed by testing framework inside testing functions (`test_*`). These modules are not compiled by default and are not part of the distribution. They need to be compiled in order to run the test. I guess we can compile additional modules and put them to one separate directory in distribution, or we can have debug distribution with testing framework and these modules, or we can create a similar system as we have for addons (on MS Windows). The modules could be compiled a prepared on server for download and they would be downloaded by testing framework or upon user request.
 == Locations, mapsets and data ==
 …
 We should have dedicated test locations with different projections and identical map names. I wouldn't use the GRASS sample locations (NC, Spearfish) as test locations directly. We should have dedicated test locations with selected data. They can overlap with (let's say) NC but may contain less imagery but on the other hand some additional strange data. The complication are doctests which are documentation, so as a consequence they should use the intersection of NC sample and testing location. The only difference between the locations would be the projection, so it really makes difference only for projected, latlon and perhaps XY.
+All data should be in PERMANENT mapset. The reason is that on the fly generated temporary mapsets will have only access to the PERMANENT mapset by default. Access to other mapsets would have to be explicitly set. This might be the case when user wants to use his or her own mapset. On the other hand, it might be advantageous to have maps in different mapsets and just allow the access to all these mapsets. User would have to do the same and would have to keep the same mapset structure (which might not be so advantageous) which is just slightly more complex then keep the same map names (which user must do in any case).
 If multiple locations are allowed and we expect some maps being in the location such as elevation raster, it is not clear how to actually test the result such as aspect computed from elevation since the result (such as MD5 sum) will be different for each location/projection. This would mean that the checking/assert functions or tests themselves would have to handle different locations and moreover, this type of tests would always fail in the user provided location.
 …
 All reference files (and perhaps also additional data) will be located in the `testsuite` directory. There can be also one global directory with additional data (e.g. data to import) which will be shared between test suites and exposed by the testing framework.
+The reference checking in case of different locations (projections) can only be solved in the test itself. The test author has to implement a conditioned reference check. Alternatively, a function (e.g., `def pick_the_right_reference(general_reference_name, location_name)`) could be implemented to help with getting the right reference file (or perhaps value) because some naming conventions for reference files will be introduced anyway.
 Testing framework design should allow us to make different decisions about how to solve data and locations questions.
+== Weekly reports ==
+Testing data will be available on server for download. The testing framework can download them if test is requested by user. The data can be saved in the user home directory and used next time. This may simplify things for users and also it will be clear for testing framework where to find testing data.
+== GSoC weekly reports ==
 === Week 01 ===