wiki:SampleDataset

Version 22 (modified by wenzeslaus, 9 years ago) ( diff )

link addon, fix syntax: rst -> trac

GRASS GIS Sample Datasets

Description and discussion of (future) standardized GRASS GIS 7 dataset - an updated, simplified and extended nc_spm_08, to be used in manual pages, tutorials, courses, code testing, development and OSGeo Live.

User-oriented page with examples and list of existing datasets is at GRASS Wiki: GRASS GIS Standardized Sample Datasets.

Contributors: Helena Mitasova, Markus Neteler, Vaclav Petras, Anna Petrasova, Hamish Bowman, [contribute and add yourself]

Proposal

The complete sample data set will include:

  • ncarolina_spm_complete location with PERMANENT mapset with baseline data and several additional mapsets with specialized data in NC State plane [m] (see North Carolina - EPSG codes)
  • world_ll location in geographic coordinate system [deg]
  • external data in various coordinate systems

Basic NC location with PERMANENT mapset

This is based on the original nc_spm_08 location with simplified, standardized names for map layers, some updates. To keep the data set simple, more specialized files were moved into separate mapsets:

  • ncarolina_spm_base
    • PERMANENT
      • raster
      • vector
    • practice1 (empty mapset)

Common rules

  • Names of maps/layers must be the same for all standardized datasets. No additions to names such as _10m or _wake_county are allowed. This also implies that names must be in English, national language is not allowed for national standardized datasets (however, if desired, we can can work on a script which would automatically rename multiple maps in dataset and would also find and replace names in documentation and tutorials).
  • Description in tables here should be usable as title of the map. Separate details in description, which should not be part of the title, using commas or parentheses. Titles can differ between standardized datasets and can use national language (unlike names).

Rasters

  • resolution and extent (in the sense of number of rows and columns) should be the same of very similar for all standardized datasets (obviously, actual geographical extent can be different)
  • standard resolution: 10m (marked as std)
  • standard rows x columns (cells): xxx (x) (marked as std)
name area for NC description for NC resolution rows x columns (cells) note
basins South-West Wake county Watersheds derived from NED 30m std std
elevation South-West Wake county Elevation NED std std
elevation_shade South-West Wake county Shaded relief std std
geology South-West Wake county Geology derived from a vector map std std
lakes South-West Wake county Wake county lakes std std
landuse South-West Wake county Landuse in 1996 std std
orthophoto for CC or rural area orthophoto (R, G, B, NIR) 1m res or better
soils South-West Wake county soil type should be vector data?

Vectors

name area for NC description for NC feature type number of features note
boundary_region South-West Wake region boundary polygon map
boundary_state State of NC NC State map polygon map
census Wake County Wake County census blocks with attributes, clipped polygon
firestations Wake County fire stations points map
geology Wake County North Carolina geology map polygon map
geonames Wake County geonames points map
hospitals Wake County North Carolina hospitals points map
history_markers move to archeology mapset?
parcels for CC or rural area
points_bare_surface CC or rural bare ground lidar points for interpolation
points_of_interest Wake County points of interest (examples?) points map
railroads North Carolina railroads lines map
roadsmajor Wake County major highways and roads lines map
schools Wake County schools points map
streams South-West Wake streams lines map
streets Wake County roads and streets lines map
zipcodes Wake County zip codes polygon map

Specialized Mapsets

To be distributed with the ncarolina_spm_base location so that they include PROJ information and are readable by GRASS. Mapsets cannot be distributed without a location because they lack PROJ_INFO.

(note that providing just mapsets did not work, it was very confusing for users. But perhaps the only problem was that that "GRASS welcome screen" cannot unpack (unzip, untar) a mapset and copy it into an existing mapset. (MN: we need a button "Download sample data" in the welcome screen!)

  • elevation: several elevation models at different scales, lidar
  • landsat: set of Landsat scenes with different timestamps
  • networks: vector networking data + LRS
  • orthoimg: set of aerial image scenes for image classification, including images from UAV
  • modis: MODIS time series with temporal GRASS DB (can be easily generated with http://pymodis.fem-environment.eu/)
  • climate: climatic time series with temporal GRASS DB
  • archeology: sites? historical maps/topography

Baseline world location

Already there: demolocation/ in the source code

  • world_ll_base
    • PERMANENT
  • specialized mapsets to be distributed with world_ll_base
    • climate
    • landcover

Notes after HVA discussion in 2015 with Hamish

  • barebone dataset for OSGeo Live which will include a script to generate the derived data needed for tutorials (?)
  • metadata
  • maps (layers)
    • exclude layers which can be generated
    • secref elevation? + orthophoto
    • add NC WMS service to GUI
    • soils, geology, lakes just vector
    • elevation, landuse, orthophoto
    • fields - parcel plots - with anonymized names - for secref - planimetry
    • SPOT image
    • elevation and precipitation points into baseline
    • add zipcodes
    • powerlines (extended)
    • separate layers for state and counties boundaries
    • natural earth for latlon dataset (data are on github)

Comments

The data sets can be distributed separately or we can have packages with several mapsets or all mapsets, depending on the size. I found that packaging and distributing mapsets without location is not practical, so I ended up distributing the specialized mapsets with nc_spm_baseline or world_baseline - is this OK?. We also need to figure out how to include the original metadata that come with the original data - link to the source in history file may be enough.

The baseline location+mapset should be simple with easy to understand names of map layers, my only issue is the loc_ncspm_baseline name as I am not able to come up with a simple name that would say that this is a location with North Carolina data in state plane meters coordinate system - maybe loc_ncarolina would be better, assuming that state plane in meters is the official coordinate system for NC? But I also keep ncspf for feet and ncutm.

Notes about data sources

There is a lot of data, main challenge now is to select a consistent, meaningful, but not too large data set. Many data sets are regularly updated and new ones are posted but tutorials and man pages require stable data to work the history file should include link to the original data source with a note that an updated version of the data map be available from there

Significant natural heritage areas and natural heritage element occurrences posted on NC one map.

TODO

List of actionable items.

  • replace historical markers with historical places (done?)
  • Keep in mind what tasks we want to do with the data when selecting them (e.g., table join, selections, buffering etc). This influences the choice.

Integration with GRASS GIS unit test suite

All tests in the "gunittest" environment (see also overview) need to be written in a way that the map names correspond. See also

http://grass.osgeo.org/grass71/manuals/libpython/gunittest_testing.html#data

Download of draft location package

Note: See TracWiki for help on using the wiki.