Opened 8 months ago

Last modified 5 months ago

#3771 new defect

Run tests on Travis

Reported by: pmav99 Owned by: grass-dev@…
Priority: normal Milestone:
Component: Tests Version: svn-trunk
Keywords: Cc:
CPU: Unspecified Platform: Unspecified

Description

At the moment, Travis only builds GRASS without running any tests. The tests should be running on Travis for each commit and there should be notifications on the list whenever a commit breaks the tests.

Attachments (1)

testsuite.patch (3.0 KB) - added by pmav99 7 months ago.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 8 months ago by neteler

Patches would be gratefully accepted!

comment:2 Changed 8 months ago by pmav99

Well for starters, #3769 needs to be merged.

comment:3 Changed 8 months ago by pmav99

Running tests on CI doesn't really make sense if you have failing tests: http://fatra.cnr.ncsu.edu/grassgistests/summary_report/nc/index.html

So the next step would be to make the tests pass.

  • Ideally, the failing tests should be fixed.
  • If that is not possible the tests should be marked to be skipped
  • If that is not possible then I would suggest removing them for now + opening an issue to re-add them when they are fixed (one issue per removed test)

https://docs.python.org/2/library/unittest.html#skipping-tests-and-expected-failures

Last edited 8 months ago by pmav99 (previous) (diff)

comment:4 in reply to:  3 ; Changed 8 months ago by sbl

Replying to pmav99:

Running tests on CI doesn't really make sense if you have failing tests: http://fatra.cnr.ncsu.edu/grassgistests/summary_report/nc/index.html

To my knowledge, most of the tests that are currently failing, started failing due to fundamental changes from the introduction of Python 3, e.g. everything related to bytes vs. strings...

So we would have to sort out those from tests that are failing for other reasons...

So the next step would be to make the tests pass.

In general +1, but again, lets try to distinguish Pythen 3 related test failure from all others... Currently, somehow, most failing tests tell us how far we are from Python 3 (and 2) support...

comment:5 in reply to:  4 Changed 8 months ago by pmav99

Replying to sbl:

To my knowledge, most of the tests that are currently failing, started failing due to fundamental changes from the introduction of Python 3, e.g. everything related to bytes vs. strings...

If you scroll down far enough here there is this graph:

http://fatra.cnr.ncsu.edu/grassgistests/summary_report/nc/files_successful_plot.png

Last September, indeed, there was a spike of failing tests but still ~10% of the tests were practically always failing.

That being said, supporting both Python 2 and 3 is a requirement encountered primarily in libraries, not applications. Applications usually only need to support a single Python version. GRASS does provide an API, but, at least from where I stand, it is primarily an application. Nevertheless, the decision that GRASS 7.8 will support both Python 2 and 3 has been made and unless it is revoked, the code needs to support both versions.

Nowadays, there is a great deal of experience in the Python community WRT both porting code from Python 2 to 3 and to supporting both Python versions from a single codebase. The latter usually involves using projects like https://pypi.org/project/six/ and/or https://python-future.org/. Using them is of course not mandatory. Most projects only need a small part of their functionality, so they can often get away with just a [{{compat.py}}} module and appropriate imports. On projects of significant complexity though, it is often a good idea to use them.

Long story short, it is indeed more difficult to maintain a cross-version compatible codebase but with some concessions it is an achievable goal.

In my humble view, the fact that the tests are failing doesn't have to do with with Python 2 or Python 3 or even with the cross-version compatibility. There are two rules that are pretty much unanimously accepted in the software development industry:

  1. Don't break the build.
  2. You break the build, you fix it.

(obviously, running the tests is considered part of the build)

The afore-mentioned graph shows that GRASS devs don't run the tests as part of their development routine. I argue that this needs to change. Arguably, no one wants to wait for a long running testsuite to finish. And that's absolutely natural. Honestly speaking, no one should have to wait. That's what CI is for. To build the code and run the tests.

The gist of the issue here is supporting GRASS' growth. Automated builds and improved testing procedures are a prerequisite of this goal. An important checkpoint in this effort is achieving a green build.

comment:6 Changed 8 months ago by sbl

In principle, fully agreed on all points regarding tests.

Just to add some history. Python 3 support was to a significant amount worked on in a GSoC project in 2018 (https://lists.osgeo.org/pipermail/soc/2018-August/004186.html), although that was a big step forward, open issues remained. And after that project finished, a decision had to be made to either merge the unfinished code (leading to failing tests) or wait and risk that merging the code later would become harder and harder. Decision was made for the latter (see r73229).

Comparing test results before and after this commit (2018-09-02 vs. 2018-09-03) can give an idea about failing tests because of introduction of Python 3 vs. other failures:

So, making tests pass is a matter of resources (time) I guess, and I am sure contributions are more than welcome. The question is, what is the best way forward. Suggestions fromm my side would be to:

  • Make the test suite and test coverage a topic for a community sprint. If those who know the test suite well could help others (like me) to improve their procedures that would be really cool and appreciated.
  • Prioritize move to git if that helps people like you, Panos, to contribute and help with Python 3 support (which is supposed to be a focus for the 7.8 release) Really good to see your recent activity here, BTW!
  • Crack down on the other 10% of failing test (fix or deactivate temporarily if necessary as you suggest)

comment:7 Changed 8 months ago by sbl

Another relevant issue are test datasets, it seems.

Some maps used in tests are not available in all sample data used for testing (or with different names. If I remeber correctly, some effort has been put on this relatively newly, as e.g. maps are copied to the correct name used in the testsuite, before tests are run.

For example, in db/db.copy/testsuite/test_dbcopy.py the map "zipcodes" is used, which is available in nc_basic, while a suitable map in nc_spm_08 is named zipcodes_wake, the demolocation does not have a comparable map. The documentation suggests to use nc_spm dataset for testing (https://grass.osgeo.org/grass76/manuals/libpython/gunittest_testing.html).

Should tests (where necessary) be updated to work with nc_spm? (or should input map names be chocen based on name of the location a test is ran in)?

I also had a look at how to skip tests (e.g. if maps are missing in the location used to run tests). But could not figure out haow to do that. Hints are welcome!

comment:8 Changed 7 months ago by pmav99

@sbl Skipping tests can be with this. I haven't really tested how it works with gunittest, but my guess is that it should work out of the box.

comment:9 Changed 7 months ago by sbl

Some updates after a first attempt at fixing failling tests. The tests test_v_in_pdal_basic and test_v_in_pdal_filter should be probably moved from ./vector/v.in.lidar to ./vector/v.in.pdal However, tests should only be executed if the module is compiled and avaialable (does not seem to be the case for v.in.pdal). The latter is implemented in r74287.

Tests with issues with input data, where values changed (possibly also because of changed input data):

  • ./vector/v.in.lidar – mask_test
  • ./vector/v.univar – v_univar_test
  • ./raster/r.basins.fill/testsuite/testrbf.py
  • ./vector/v.surf.rst – test_vsurfrst (elev_points is 3D)
  • ./lib/python/gunittest – test_assertions_vect (column GLEVEL in map schools is not numeric)
  • ./imagery/i.vi/testsuite/test_vi.py

Bugs (suffix gets messed up):

  • ./raster/r.horizon/testsuite/test_r_horizon.py

Failing tests due to changes in functions:

  • ./lib/python/temporal – unittests_temporal_raster_conditionals_complement_else

Test with various issues failures (many of the temporal tests should be fixed by making functions accept unicode in r74285 and r74286)

  • ./scripts/g.extension – test_addons_modules
  • ./lib/python/pygrass/raster/testsuite/test_history.py
  • ./lib/python/pygrass/modules/testsuite/test_import_isolation.py
  • ./temporal/t.vect.algebra/testsuite/test_vector_algebra.py
  • ./temporal/t.rast3d.algebra/testsuite/test_raster3d_algebra.py
  • ./temporal/t.rast.accumulate/testsuite/test_accumulation.py
  • ./temporal/t.rast.algebra/testsuite/test_raster_algebra_granularity.py
  • ./temporal/t.rast.algebra/testsuite/test_raster_algebra.py
  • ./temporal/t.rast.aggregate/testsuite/test_aggregation_absolute.py
  • ./temporal/t.rast.accdetect/testsuite/test.t.rast.accdetect.sh
  • ./temporal/t.rast.accdetect/testsuite/test_simple.py
  • ./temporal/t.rast.accdetect/testsuite/test.t.rast.accdetect.reverse.sh
  • ./lib/python/temporal/testsuite/unittests_temporal_raster_algebra_equal_ts.py

Tests where encoding is a problem:

  • ./temporal/t.info
  • ./vector/v.what – test_vwhat_layers does not fail on my system, maybe related to the locals on the server that runs the tests...

Tests where expected output changed from string to unicode

  • ./lib/python/temporal/testsuite/test_temporal_doctests.py
  • ./lib/python/pygrass/vector/testsuite/test_pygrass_vector_doctests.py
  • ./lib/python/pygrass/raster/testsuite/test_pygrass_raster_doctests.py

comment:10 in reply to:  9 Changed 7 months ago by pmav99

@sbl Great work!

Replying to sbl:

The tests test_v_in_pdal_basic and test_v_in_pdal_filter should be probably moved from ./vector/v.in.lidar to ./vector/v.in.pdal However, tests should only be executed if the module is compiled and avaialable (does not seem to be the case for v.in.pdal). The latter is implemented in r74287.

You are right that the test_v_in_pdal_* should be moved to ./vector/v.in.pdal/. The cause for this was my mistake in #3792 (last two lines of the suggested fix).

WRT skipping the tests, I believe that you can also apply the skipIf decorator directly to the Testsuite. That being said, if possible, building on Travis should enable and test optional configurations too.

comment:11 Changed 7 months ago by sbl

I think I can fix the g.extension issue tonight.

However, for the tests failing because of issues with the input data, it would be really useful to get access to the

nc_spm_full_v2alpha

location.

Unfortunately, I have not found it for download...

comment:12 in reply to:  11 Changed 7 months ago by neteler

Replying to sbl:

I think I can fix the g.extension issue tonight.

However, for the tests failing because of issues with the input data, it would be really useful to get access to the

nc_spm_full_v2alpha

location.

Unfortunately, I have not found it for download...

Dataset now packaged and published by wenzeslaus:

http://fatra.cnr.ncsu.edu/data/

Changed 7 months ago by pmav99

Attachment: testsuite.patch added

comment:13 Changed 7 months ago by pmav99

Thank you @wenzeslaus

@everyone I used the attached patch to get the testsuite runner to use the new dataset.

comment:14 in reply to:  13 Changed 7 months ago by neteler

I have updated the gunittest docs in r74315 to also mention Travis-CI and the new sample dataset.

Replying to pmav99:

I used the attached patch to get the testsuite runner to use the new dataset.

Patch commited in r74316 (after shallow local testing), thanks.

Last edited 7 months ago by neteler (previous) (diff)

comment:15 Changed 7 months ago by pmav99

Note 1: Some tests require certain compilation options to be enabled (e.g. pdal, liblas etc). If support is missing the tests are failing. Ideally these tests should be skipped instead.

Note 2: I did run the tests on an ubuntu VM where all the options were configured (except opencl and dwg). I did get several more failures than the ones reported here. More specifically on my VM:

(SVN revision 74319)
Executed 233 test files in 0:29:16.353563.
From them 194 files (83%) were successful and 39 files (17%) failed.

For comparison, the latest results from the test server are:

(SVN revision 74304)
Executed 233 test files in 0:36:57.428963. 
From them 216 files (93%) were successful and 17 files (7%) failed.

I guess that this difference implies that the server is configured in a specific way which needs to be documented before we can replicate it on Travis.

comment:16 Changed 7 months ago by neteler

At time the "fatra" server no longer runs, it is frozen at date

2019-03-26 03:01:54	74304	nc_spm_full_v2alpha	93%	98%

@vaclav: could you please check? thanks

http://fatra.cnr.ncsu.edu/grassgistests/summary_report/nc/index.html

comment:17 Changed 7 months ago by sbl

In 74327:

fix multirunner.py with Py3; fix #3798; more Py3 fixes see #3771

comment:18 Changed 7 months ago by sbl

Now multirunner also works with Python 3 and tests for P2 and P3 are more or less even...

However, a new issue pops up:

NameError: global name '_' is not defined

Also, in Py3 I had to deactivate the following tests as the cause multirunner to freeze...

lib/python/pygrass/modules/interface/testsuite/test_pygrass_modules_interface_doctests.py
lib/python/temporal/testsuite/test_temporal_doctests.py
temporal/t.vect.algebra/testsuite/test_vector_algebra.py

comment:19 in reply to:  18 Changed 7 months ago by pmav99

Replying to sbl:

Now multirunner also works with Python 3 and tests for P2 and P3 are more or less even...

However, a new issue pops up:

NameError: global name '_' is not defined

That's related to #3790. Do you remember which tests are the problematic ones?

Last edited 7 months ago by pmav99 (previous) (diff)

comment:20 in reply to:  16 Changed 7 months ago by wenzeslaus

Replying to neteler:

At time the "fatra" server no longer runs, it is frozen at date

2019-03-26 03:01:54	74304	nc_spm_full_v2alpha	93%	98%

@vaclav: could you please check? thanks

http://fatra.cnr.ncsu.edu/grassgistests/summary_report/nc/index.html

Seems like a minor issue so far. Tests should still run locally (if you try). I should know more tomorrow (debug message added).

comment:21 in reply to:  6 Changed 7 months ago by wenzeslaus

Replying to sbl:

  • Crack down on the other 10% of failing test (fix or deactivate temporarily if necessary as you suggest)

You don't need 100% passing to run it on CI reasonably. That's the ideal state, but we can start with setting a threshold to what you consider OK. The idea is already there. Notice the colors of the percentages. What you need to add is mechanism for the main script to fail and not fail with a given success percentage value. This would catch serious problems, but that will be still quite a progress. So far it was mostly me checking the fatra server website and reporting it to mailing list (or, I think, directly to the committer in one recent case).

comment:22 in reply to:  7 Changed 7 months ago by wenzeslaus

Replying to sbl:

Another relevant issue are test datasets, it seems.

As for (older) full and (newer) basic version of NC SPM, I think nc_spm_full_v2alpha solves it for now (not ideal, but works). Making nc_spm_full_v2alpha work is a good goal for the Travis runs now.

However, please also note the idea of different sample datasets, i.e. other than NC SPM (old/full or new/basic or full v2). Please see: GRASS-dev: Testsuite: script for easier use of test framework https://lists.osgeo.org/pipermail/grass-dev/2019-January/091011.html

There are different approaches how to handle the different datasets in tests and by the runner, but idea is 1) to make easy to write very specific tests, 2) to test in different conditions like latlon GRASS Location, and 3) to have tests which run in any GRASS Location and/or without any data (so you can e.g. do make test without getting 100 MB of sample data).

comment:23 Changed 5 months ago by sbl

For the record: after Annas sweeping fixes for Python3 tests, the testsuite on my local system runs 97% of the test files and 98% of the tests successfully with Python3.

The remaining failing tests are (for Python3):

  • ./gui/wxpython/core/testsuite/toolboxes.sh
  • ./raster/r.contour/testsuite/testrc.py
  • ./lib/python/temporal/testsuite/unittests_temporal_raster_conditionals_complement_else.py
  • ./lib/python/temporal/testsuite/unittests_temporal_raster_algebra_equal_ts.py
  • ./lib/python/pygrass/raster/testsuite/test_history.py

However, unfortunately, multirunner does not finish with Python 2 any more.... So, currently, the testsuite supports either Python 2 or Python 3, but not both...

comment:24 Changed 5 months ago by sbl

For replication:

python lib/python/gunittest/multirunner.py --location nc_spm_full_v2alpha \
--location-type nc --grassbin bin.x86_64-pc-linux-gnu/grass77 --grasssrc ./ \
--grassdata $HOME/Documents/grassdata/

completes with Python 3, but not with Python 2 (Unicode error)

Note: See TracTickets for help on using tickets.