Opened 10 years ago

Closed 10 years ago

#4233 closed enhancement (fixed)

netcdf: add CF compliance checker to the autotests

Reported by: etiennesky Owned by: etourigny
Priority: normal Milestone:
Component: Autotest Version: unspecified
Severity: normal Keywords: netcdf, autotest
Cc: etourigny, pds

Description

In order to test present and future NetCDF driver code for CF compliance, I propose to add support for the CF compliance checker in the autotests.

The checker is implemented in python and available here: http://pypi.python.org/pypi/cfchecker/2.0.2

It requires that Python, udunits and CDMS (from CDAT) be installed, which might not be supported in every machine.

There are also online checkers using the same tool: http://titania.badc.rl.ac.uk/cgi-bin/cf-checker.pl http://puma.nerc.ac.uk/cgi-bin/cf-checker.pl

These could be used in the absence of a local CDMS install, or better yet we could host a similar app on the gdal server (as to not induce load on these servers), with a limit on file size. Could this be done easily on the gdal or osgeo server?

The checker checks mostly for valid CF tags, but does not report if the grid structure is supported by other software (discussed in ticket #2893). The bug reported in ticket #3324 was found with this checker.

If have adapted the python script to auto-detect the CF Convention attribute in the netcdf file and check against that CF version. I have added support for it and included a simple test in autotest/gdrivers/netcdf.py I have also written a first stab at a perl script for handling the server-side component (but it doesn't work yet).

The cfchecker script requires a few xml files (from the CF conventions and from udunits) which I will upload in one zip file. These could reside in the autotest/gdrivers/data directory, but one is rather large (cf-standard-name-table.xml at 1.3M) and should be distributed in compressed form in svn.

Attachments (5)

cfchecks-2.0.4.py (93.7 KB) - added by etiennesky 10 years ago.
modified cfchecks.py script
cfchecks-orig.py (92.5 KB) - added by etiennesky 10 years ago.
original script
netcdf.py (16.5 KB) - added by etiennesky 10 years ago.
modified netcdf.py autotest
cf-checker.pl (3.4 KB) - added by etiennesky 10 years ago.
incomplete perl script for server-side script
cfcheck-xml.zip (120.5 KB) - added by etiennesky 10 years ago.
xml files required by cfchecks

Download all attachments as: .zip

Change History (21)

Changed 10 years ago by etiennesky

Attachment: cfchecks-2.0.4.py added

modified cfchecks.py script

Changed 10 years ago by etiennesky

Attachment: cfchecks-orig.py added

original script

Changed 10 years ago by etiennesky

Attachment: netcdf.py added

modified netcdf.py autotest

Changed 10 years ago by etiennesky

Attachment: cf-checker.pl added

incomplete perl script for server-side script

Changed 10 years ago by etiennesky

Attachment: cfcheck-xml.zip added

xml files required by cfchecks

comment:1 Changed 10 years ago by warmerdam

Generally I try to keep the dependencies in the autotest scripts to a minimum. However, you can pursue this mechanism if you want. I just ask that the script gracefully skip the CF tests depending on extra packages (returning "skip" from the tests) if they won't run due to missing dependencies. Given the uniqueness of the approach you might want to handle this in a gdrivers/netcdfcf.py script instead of the main netcdf.py script.

It is unusual for us to run a server side component to support testing, but we could put this on the OSGeo "AdHoc? VM" if you wanted. If you wish to pursue this I can provide login access. Additional information is available on the AdHoc? VM at:

http://wiki.osgeo.org/wiki/AdhocVM

comment:2 Changed 10 years ago by etourigny

Cc: etourigny added; etiennesky removed
Owner: changed from warmerdam to etourigny
Status: newassigned

Thanks Frank.

I will implement this into a separate gdrivers/netcdfcf.py script and skip the tests if CDMS is not found (already did that). Perhaps someday I will add and online checker to the AdHoc? VM and request login access.

In the mean time I have asked permission to use the online servers, and waiting an answer. I now understand that the autotests are done by devs occasionally, so it would not be a big burden if we use small files.

So the policy for now will be: if you want to check for cf compliance, you are responsible to install CDMS on your end.

comment:3 Changed 10 years ago by Even Rouault

The autotests could also be used by Linux distro or other software packagers to ensure they have not messed up their builds before pushing them to the repository. Not sure they do it however ;-)

We used to have a buildbot running on OSGeo servers but unfortunately some crash occured one or two years ago and no-one has brought it to life. You can still monitor http://www.gisinternals.com/sdk/ where Tamas Szekeres maintains a hand-made buildbot for most MSVC versions and 32/64 bit builds. It is refreshed every day from latest svn head version and run the autotests.

I concur with Frank. There are a number of test files in the autotest directories that will skip if not all conditions to run the test are met (the first one being "is the driver available ?" for drivers, like netcdf, that depend on third-party optionnal libraries). Even if the driver is available, there is also the case of database-based OGR drivers where the user needs to do some preliminary setup, like creating an empty database with appropriate access rights for the user running the autotest. Some documentation at the beginning of the file documenting the setup steps is nice for people interested in enabling them.

comment:4 Changed 10 years ago by etourigny

hmm it was my impression that there was a buildbot running linux (telascience) but never checked...

I suggest that the wiki:Buildbot wiki entry should be updated (or deleted or removed from the main wiki page) to reflect the current state. None of the buildbot instances mentioned there are alive today.

comment:5 Changed 10 years ago by etourigny

Component: defaultAutotest

Just a quick question, I have to include a large file (cf-standard-name-table.xml at 1.3M). Is this too large for the gdal-autotest svn?

There are 3 alternatives: 1)distributing all by svn (1.4M) 1)distributing it in compressed form (200k), and the user has to decompress 2)having these files downloaded to autotest/gdrivers/tmp/cache

comment:6 Changed 10 years ago by Even Rouault

Etienne,

Feel free to edit the wiki to reflect the current situation towards the state of buildbot.

Yes 1.3 Mb is unusually large, but 200k would be OK I guess. But the interest of putting it into SVN is a bit mitigated if the file is only used by netcdfcf.py, if I've well understood. I'd suggest we could put that file on download.osgeo.org/gdal/data/netcdf and use the gdaltest.download_file() infrastructure to download it from there, when the netcdfcf.py detects that there's everything else needed to run the tests.

comment:7 Changed 10 years ago by etourigny

The file itself (and the others) are part of the cf test, they are xml files containing cf and udunits definitions. I would prefer to have them in svn in case they are changed. Plus download_file() needs GDAL_DOWNLOAD_TEST_DATA defined, which will not always be the case.

If it's ok I will add one zip file (with all the xml files) to data/ in svn, and decompress it automatically with python's zipfile when needed.

comment:8 Changed 10 years ago by warmerdam

This data is unsuitable large for SVN. We deliberately keep file sizes in svn quite modest to avoid bloating the autotest package and to avoid making svn grabs painfully large. The file(s) can live on download.osgeo.org/gdal/data/netcdf if the license is suitable.

comment:9 Changed 10 years ago by etourigny

If the files are compressed in a zip, they take a total of 121K which is quite reasonable, and python has a zip module built-in. Plus if they have to be updated svn is the best tool. But I don't want to be stubborn with this issue, whatever you think is best is fine with me.

BTW you might want to enforce or advertise this policy, there is a 5MB file here: autotest/gdrivers/data/vlstr_metadata.h5

The following svn pre-commit hook might work. http://beaversource.oregonstate.edu/projects/admin/browser/management/master-svn/trunk/hooks/pre-commit/max-commit-size.py

thanks

comment:10 Changed 10 years ago by Even Rouault

5 MB, ouch ! Thanks for reporting. That appears to be a very recent addition indeed : http://trac.osgeo.org/gdal/changeset/23044/trunk/autotest/gdrivers/data . I'll move that to download.osgeo.org and notify Antonio it is not appropriate to push such big files.

comment:11 Changed 10 years ago by etourigny

It seems that the large cf-standard-name-table.xml is updated quite regularly, so I will use the download_file() mechanism for that large file and get it at the source (http://cf-pcmdi.llnl.gov). I will add the other files to svn as they use a total of 88k.

thanks

comment:12 Changed 10 years ago by etourigny

Added netcdf_cf.py netcdf_cfchecks.py and data to trunk (r23079). For now 2 tests using data/cf-bug636.nc which is a valid netcdf CF file.

Will close this issue when I have more test files added to the test.

comment:13 Changed 10 years ago by etourigny

Changed in trunk (r23091) :

1) Add tests for geographic file copy and (optional) test for CF compliance

2) Merge netcdf_cf.py into netcdf.py and code clean-up

Tests 18 and 19 fail because of incorrect geotransform when copying a file with a Geographic grid without datum, and test 20 fails (when CF check enabled) because export of file with WGS84 datum is invalid : ERROR (5.6): Invalid grid_mapping_name: Geographics Coordinate System . These errors will disappear with fixes to bug #2129.

comment:14 Changed 10 years ago by etourigny

Cc: Kyle Shannon Even Rouault removed

Relaxes projection test for copy test in trunk (r23092): make projection test for copy test less stringent, only print a warning if WKT are not identical and test for PROJ.4 string. Modified for test 20.

Fixes for bug #2129 (in r23093), when creating a netcdf file from a WGS84 tiff file, create a netcdf file with valid CF projection attributes but slightly different WKT (although equal PROJ.4 string).

comment:15 Changed 10 years ago by pds

Cc: pds added

comment:16 Changed 10 years ago by etourigny

Resolution: fixed
Status: assignedclosed

closing this bug as there are sufficient tests for projected CRS. Now CF-related tests are in netcdf_cf.py. Fixed in trunk (r23198 and r23202).

Note: See TracTickets for help on using tickets.