Context Navigation

TestingFrameworkForGRASS

Timestamp:: Jul 1, 2014, 12:47:26 PM (10 years ago)
Author:: wenzeslaus
Comment:: updates of data ideas

Legend:

: Unmodified
: Added
: Removed
: Modified

GSoC/2014/TestingFrameworkForGRASS

-              v42
+              v43
 == Locations, mapsets and data ==
+The test scripts should not depend on specific mapsets for their run. In case of make system run, every test script will be executed in its own temporary mapset. Probably location will be copied for each test (testsuite) to keep the location clean and allow multiprocessing. In case of running by hand directly without make, tests will be executed in the current location and mapset which will allow users to test with their own data and projections.
+Tests (suites, cases, or scripts/modules) itself can define in which location or locations they should be executed as a global variable (both for unit and doctests, doctests in they their unittest wrapper). The global variable will be ignored in case the test script is executed by hand. When executed by make system, the test can be executed in all locations specified by the global variable, or one location specified by make system (or generally forced from the top). In other words, the global variable specifies what the test want to do, not what it is capable to do because we want it to run in any location.
+We should have dedicated test locations with different projections and identical map names. I wouldn't use the GRASS sample locations (NC, Spearfish) as test locations directly. We should have dedicated test locations with selected data. They can overlap with (let's say) NC but may contain less imagery but on the other hand some additional strange data. The complication are doctests which are documentation, so as a consequence they should use the intersection of NC sample and testing location. The only difference between the locations would be the projection, so it really makes difference only for projected, latlon and perhaps XY.
+All data should be in PERMANENT mapset. The reason is that on the fly generated temporary mapsets will have only access to the PERMANENT mapset by default. Access to other mapsets would have to be explicitly set. This might be the case when user wants to use his or her own mapset. On the other hand, it might be advantageous to have maps in different mapsets and just allow the access to all these mapsets. User would have to do the same and would have to keep the same mapset structure (which might not be so advantageous) which is just slightly more complex then keep the same map names (which user must do in any case).
+If multiple locations are allowed and we expect some maps being in the location such as elevation raster, it is not clear how to actually test the result such as aspect computed from elevation since the result (such as MD5 sum) will be different for each location/projection. This would mean that the checking/assert functions or tests themselves would have to handle different locations and moreover, this type of tests would always fail in the user provided location.
+The test scripts should not depend on specific mapsets for their run. In case of make system run, every test script will be executed in its own temporary mapset. Probably location will be copied for each test (testsuite) to keep the location clean and allow multiprocessing but this have very high cost so it should be only on some very high level. Cleanses of the location is hopefully not a big issue, i.e. it is expected to be clean. In case of running by hand directly without make, tests will be executed in the current location and mapset which will allow users to test with their own data and projections. It is not clear it it will be possible to parallelize in this case.
+Tests (suites, cases, or scripts/modules) itself can define in which location or locations they should be executed as a global variable (both for unit and doctests, doctests in they their unittest wrapper, but probably doctests should always be NC sample location). The global variable will be ignored in case the test script is executed by hand or this will be possible to specify. When executed by make system, the test can be executed in all locations specified by the global variable, or one location specified by make system (or generally forced from the top). In other words, the global variable specifies what the test want to do, not what it is capable to do because we want it to run in any location.
+We should have dedicated test locations with different projections and identical map names. It is not necessary or advantageous for tests to use the GRASS sample locations (NC, Spearfish) as test locations directly. We should have dedicated test locations with selected data. However, it seems that for maintenance reasons and doctest it is advantageous to use sample locations. For sure test locations can be derived from the sample locations or even can overlap with (let's say) NC but may contain less imagery but on the other hand, some additional strange data. The complication are doctests which are documentation, so as a consequence they should use the intersection of NC sample and testing location. The only difference between the locations would be the projection, so it really makes difference only for projected, latlon and perhaps XY.
+All data should be in PERMANENT mapset. The reason is that on the fly generated temporary mapsets will have only access to the PERMANENT mapset by default. Access to other mapsets would have to be explicitly set. This might be the case when user wants to use his or her own mapset. On the other hand, it might be advantageous to have maps in different mapsets and just allow the access to all these mapsets. User would have to do the same and would have to keep the same mapset structure (which might not be so advantageous) which is just slightly more complex then keep the same map names (which user must do in any case). (e.g. aspect is 0-360) or checking the values with lower precision expecting the input data to be the same. Alternatively, tests themselves would have to handle different locations and moreover, this type of tests would always fail in the user provided location.
+If multiple locations are allowed and we expect some maps being in the location such as elevation raster, it is not clear how to actually test the result such as aspect computed from elevation since the result (such as MD5 sum) will be different for each location/projection. This means that the checking/assert functions must be either location independent using general assumptions about data (e.g. aspect is 0-360) or checking the values with lower precision expecting the input data to be the same. Alternatively, tests themselves would have to handle different locations and moreover, this type of tests would always fail in the user provided location.
+For running tests in parallel it might be advantageous to have temporal framework working in the mode with separate mapsets.
 It is expected that any needed (geo-)data is located in the PERMANENT mapset when a specific location is requested. This applies for prepared locations. For running in the user location, it is not a requirement.
 The created data will be deleted at the end of the test. The newly created mapset will be deleted (if it was created). If we would copy the whole location (advantageous e.g. for temporal things), the whole location will be deleted. The tests will be probably not required to delete all the created maps but they might be required to delete other files (or e.g., tables in database) if they created some.
 User should be able to disable the removal of created data to be able to inspect them. This would be particularly advantageous for running tests by hand in some user specified location.
+The created data will be deleted at the end of the test. The newly created mapset will be deleted (if it was created). If we would copy the whole location (advantageous e.g. for temporal things), the whole location will be deleted. The tests will be probably not required to delete all the created maps but they might be required to delete other files (or e.g., tables in database, temporal datasets) if they created some. The each tests might run in its own directory (and files should be created just with a name, not path), so it is clear at the end what to clear (or what to put to a report).
+User should be able to disable the removal of created data to be able to inspect them. This would be particularly advantageous for running tests by hand in some user specified location. Some (or all) data can be also copied to (moved to, or created in) the report, so they would be accessible in a HTML report.
 In case the mapset information is needed, then `g.mapset -p` (prints the current mapset) must be parsed within the test. An special case, a test of `g.mapset -p` itself will create a new mapset and then switch to the new one and test there.
+Tests can provide also their own data and perhaps their own grassdata (GISDBASE) or location. This for sure does not save space but it would allow to have special data. For tests of mapset handling, re-projections and other management things, this is necessary. It is up to the test to correctly handle location switching (done only for subprocesses if possible) and clean up, although testing framework may provide some functions for this (e.g., to not do the clean up when not needed).
 Testing framework can have a function to check/test the current location (currently accessible mapsets) whether it contains all the required maps (according to their name).
 All reference files (and perhaps also additional data) will be located in the `testsuite` directory. There can be also one global directory with additional data (e.g. data to import) which will be shared between test suites and exposed by the testing framework.
 The reference checking in case of different locations (projections) can only be solved in the test itself. The test author has to implement a conditioned reference check. Alternatively, a function (e.g., `def pick_the_right_reference(general_reference_name, location_name)`) could be implemented to help with getting the right reference file (or perhaps value) because some naming conventions for reference files will be introduced anyway.
+All reference files (and perhaps also additional data) will be located in the `testsuite` directory. There can be also one global directory with additional data (e.g. data to import) which will be shared between test suites and exposed by the testing framework. This must be specified separately or somehow connected to the location (we anyway need mapping of predefined location names to actual directories). It is not clear how to solve the problem of different projections. Projection limits the usage of data to one location except if you test (or want to risk) import to another location and re-projection.
+The reference checking in case of different locations (projections) can only be solved in the test itself. The test author has to implement a conditioned reference check. Alternatively, a function (e.g., `def pick_the_right_reference(general_reference_name, location_name)`) could be implemented to help with getting the right reference file (or perhaps value) because some naming conventions for reference files will be introduced anyway. Easier alternative is to have different tests for different locations and separately the tests which are universal. The question is what is universal. Do we expect the universal tests not to be dependent on exact values, approximate values, or map names?
 Testing framework design should allow us to make different decisions about how to solve data and locations questions.
+Testing data will be available on server for download. The testing framework can download them if test is requested by user. The data can be saved in the user home directory and used next time. This may simplify things for users and also it will be clear for testing framework where to find testing data.
+Testing data will be available on server for download. The testing framework can download them if test is requested by user. The data can be saved in the user home directory and used next time. This may simplify things for users and also it will be clear for testing framework where to find testing data. The download can also download specialized data for individual tests (and perhaps even tests itself when they are not part of the distribution and C code on Unix and binaries on MS Windows).
 == GSoC weekly reports ==