wiki:CsMapRfc7

Version 10 (modified by Norm Olsen, 10 years ago) ( diff )

Updated Status Table

CS-Map RFC # - NSRS2007 / NSRS 2011 Implementation

This page contains an change request (RFC) for the CS-Map Open Source project. More CS-Map RFCs can be found on the RFCs page. The change described in this request is to add support for the relatively new National Spatial Reference Systems of 2007 and 2011. These are, essentially, horizontal and vertical geodetic reference systems for the United States and its territories; produce by the National Geodetic Service of the United states. The acronyms NSRS2007 and NSRS2011 are commonly used to refer to these systems. They are also known as NAD83(2007) and NAD83(2011).

Status

RFC Template Version(1.0)
Submission Date18 February 2014
Last Modified18 February 2014
AuthorNorm Olsen
RFC StatusApproved
Implementation StatusImplemented and submitted
Proposed Milestone14.30(???)
Assigned PSC guide(s)Norm Olsen
Voting History
+1Norm
+1Hugues
+1Frank
+0
-0
-1
no vote All others

This RFC has been implemented. There is crucially important information in the Implementation Notes section at the bottom of this RFC document.

Overview

While National Spatial Reference System 2007 )NSRS2007) has been around for several years, the shift defined by the new system, relative to the previous system (NAD83/96, aka HARN, HPGN, NAD83/91; we'll use HARN in this RFC document) was considered too small to deserve a defined shift definition. That is, the shifts were on the order of a few centimeters and at that time this was considered to be as small as the level of error. Fats forward to 2013, and precise and definitive geodetic shift models have been developed for NSRS2007. This was done at the time the US National Geodetic Survey was defining the National Spatial Reference System of 2011. Thus, at the current time, definitive models and algorithms exist for the migration of geodetic coordinates from HARN to NSRS2007 and subsequently to NSRS 2011.

Motivation

Please note that this is a sequential conversion process. That is, geodetic coordinates referenced to NAD83 must first be converted to HARN, and then converted to NSRS2007, before they can finally be converted to NSRS2011. Each of these three conversion processes has its own level of error, and each produces shifts of centimeters in magnitude. The following table indicates the expected magnitude of shifts for the three distinct datum shift calculations:

Datum Shift CalculationExpected Shift in centimeters
NAD83 --> HARN40
HARN --> NSRS2007 3
NSRS2007 --> NSRS20113

Unlike the NAD83 <--> HARN shift which has now been available for almost 2 decades, the new references systems do include vertical components. The vertical shifts are of the same magnitude as the horizontal shift indicated above. Perhaps this vertical shift is important to some users.

Finally, the new reference system definitions include Alaska which was never included in the HARN series of datum shift files. Quite frankly, I'm unaware of exactly how to convert Alaskan geography from NAD83 to NSRS2007. NSRS models do exist for Puerto Rico and the Virgin Islands, but not for Hawaii or American Samoa for which HARN datum shift files have been defined.

So, the actual significance of this change is questionable. A small portion of the user base is likely to want these features. The implementation of these new geodetic systems no negative effect on the operation of the library other than, perhaps, disk space. The data files used to model the NSRS2007 and NSRS2011 shifts are relatively huge, and each of the three geographic areas covered (48 states, Alaska, and Puerto Rico) requires three distinct data files (longitude shift, latitude shift, and vertical shift). Each of the data files for the 48 states are about 28 megabytes in size. Thus, the size of distribution files could grow to be unreasonably large if full support of the NSRS2007 & NSRS2011 geodetic reference systems is to be included.

Please remember, you can't get to NSRS2011 without first converting to NSRS2007. So, including coverage for 49 states in three dimensions for both of the new reference systems in a distribution will require an additional 456 megabytes of data. It is indeed possible, and perhaps likely, that the data files used in these new reference systems could be de-densifed - a grid cell in all of these files is one minute by one minute; but that will have to be the subject of a new and different RFC.

Proposed Solution

From a technical aspect of view, implementing NSRS2007 and NSRS2011 is rather straight forward. The models for these datum shifts as produced by the NGS is based on a new file format/interpolation combination which the NGS refers to as GEOCON. This format is very similar (almost identical) to previous file formats used, is that used for both horizontal and vertical NSRS datum shifts, and is also now used for the latest geoid height models produced by NGS. Adding a new grid interpolation file format named GEOCON to the repertoire which understands this new file format is relative straight forward; several standard modules are coded and the appropriate entries made in the grid file interpolation table. These code changes can be examined by looking at CS_geocn.c and CSdataDT.c

Having made the GEOCON grid file format option available, the NSRS2007 and NSRS2011 geodetic transformations can be easily be defined in the Geodetic Transformation Dictionary by referencing the appropriate files. As Datum definitions are now simply name placeholders, the NSRS2007 and NSRS2011 datums are easy to define.

Having the above in place leaves the more difficult and error prone, bureaucratic portion of the implementation. Theis phase includes several tasks which are intended to be done - to the degree possible - via automated means. The process here typically implies:

  1. writing a program which produces tabular information in 'C++' code syntax,
  2. manually editing the result tabular data to add/remove exemplary cases, and
  3. writing a program to manipulate data (such as a dictionary source file) using the modified tablular data.

The result is a modification to, for example, a dictionary file which can be examined and tested. Should test results indicate, the program/table is adjusted and the modification process is repeated until the modified data file (typically a dictionary source file) satisfies the necessary requirements. In this manner, the modifications to the necessary data files are accomplished with the lowest possible probability of error, typographical or otherwise. These various programs and resulting tables are all version controlled in the folder named ConsoleUtilities.

Using the above described technique, the following chores must be accomplished:

  • A NSRS2007 version of most all HARN CRS systems must be generated.
  • A NSRS2011 version of most all HARN CRS systems must be generated.
  • Catalog entries for all new CRS systems must be generated.
  • NameMapper entries for all new systems must be generated.
  • Entries for the EPSG names and EPSG code must be added to the NameMapper and properly associated with the new CS-MAP NSRS CRS definition.
  • Entries for the ESRI names and ESRI codes must be added to the NameMapper and properly associated with the new CS-MAP NSRS CRS definitions and the EPSG Names and codes.
  • To perform that immediately above, programs which manipulate ESRI projection files (i.e. the WKT files with the .prj extension) need to written to properly extract ESRI names and code.
  • Programs which will properly match EPSG, ESRI, and CS-MAP definition names and codes need to be written.

Having accomplished all of the above, NSRS2007 and NSRS2011 will have been implemented, but with one major issue remaining.

The HPGN Problem

As indicated above, conversion from NAD83 to NSRS2007 first requires a conversion to HARN (aka NAD83/96, HPGN, NAD83/91, etc.). Conversion from NAD83 to HARN is accomplished through the use of grid interpolation files with names in the form of ??hpgn.l?s where the first two characters are generally replaced with a two character state code. For example, two data files named cohpgn.las and cohpgn.los define the shift required to convert from NAD83 to HARN in the state of Colorado. The .las file carries the latitude shift values, while the .los file carries the longitude shift values. The complicating factor here is that these data file sets naturally overlap and the shifts carried in the overlapping files are not the same in the regions of overlap. Thus, for geography covered by the regions of overlap there are indeed two (or more) officially designated shift values.

This is why in release 13.0 of CS-MAP, the original HARN datum was replaced by some 40+ distinct datums named HARN/??. Each of these new datums essentially provided the end user with the means to address a very specific set of HPGN data files. This capability was greeted with much appreciation for those working geography in the regions of HPGN file overlap.

With the introduction of NSRS2007, however, we now have yet another problem. At the datum shift calculation level, we only have geographic coordinates to work with. If such geographic coordinates came from the inverse projection conversion of a state plane CRS is unknown at this level. Thus, the conversion of, say, NAD83 to NSRS2007 presents the problem of which specific set of HPGN data files should be used for any given coordinate. There is no clear, unambiguous, solution to this problem.

The 48hpgn.l?s File

It is proposed that this problem be addressed by the generation of a new totally unofficial set of HPGN files which model the NAD83 <--> HARN datum shift for all 48 conterminous states. This file to be produced using the following concepts:

  • All HPGN datum shift files have a standard grid cell size of 15 minutes of latitude/longitude.
  • Fairly definitive state boundaries are available from the EPSG database; certainly definitive enough to predict in which state any 15 minute Lat/long coordinate resides in.
  • We then establish a HPGN grid file set named 48hpgn.las and 48hpgn.los which covers the entirety of the 48 conterminous states and has a 15 minute grid cell size.
  • For each data point in the 48HPGN data file set:
    • Determine the state in which the point lies
    • Determine the specific HPGN data file set from which values are to be extracted,
    • Extract the appropriate data values and insert into the 48HPGN data file set being created.

This process will succeed in generating valid coverage for most all of the 48 state geography. Problems occur where there are substantial waterways between states which are not included in the state boundary polygons; Chesapeake Bay, and the Great Lakes for example. These situations need to be detected and then specifically dealt with.

Additional Complications

The following information will be of value to anyone reviewing the proposed changes:

  1. There are three cases where the official HPGN data files set cover more than one state:
    • the wohpgn.l?s data file set covers both Washington and Oregon
    • the nehpgn.l?s data file set covers VT, NH, MA, and CT; i.e. New England (sans Maine)
    • the mdhpgn.l?s data file set convers both Maryland and Deleware.

2> There are four situations for which there are two HPGN data file sets for a given political entity:

  • The cshpgn.l?s and cnhpgn.l?s files cover southern and northern California respectively.
  • the emhpgn.l?s and wmhpgn.l?s files cover eastern and western Montana respectively.
  • the ethpgn.l?s and wthpgn.l?s files cover eastern and western Texas respectively.
  • the eshpgn.l?s and wshpgn.l?s files cover eastern and western American Samoa respectively. These files only overlap with themselves, and the region of overlap is all Pacific Ocean, so these files do not present a specific problem.

Of course, in the case of multiple HPGN data file sets covering a single state the files overlap, and (of course) the grid shift values in the regions of overlap are not the same. It is now known, for example, in the region of overlap in Montana, the differences between the two files is as much as 17 centimeters at a certain point.

Result Evaluation

In regions internal to a specific state, the 48HPGN file solution produces results basically identical with the previous releases. The exact same interpolation code is used on the same data values. However, when one approaches the the boundary of any specific HPGN region (i.e. the geography where HPGN data file sets overlap) the results can be as much as 10 centimeters different.

The best example of this condition is the four corners area of the southwestern United States, a point of geography where four state borders come intersect and there are four different sets of HPGN data files which cover the intersection point. Thus there are four separate values of the shift at that point. The Colorado and Utah numbers, while not identical, are very close. The Arizona and New Mexico numbers are very close. However, there is a 10 centimeter difference between the two sets of numbers. Who is to say which is correct?

Testing

Testing of the horizontal shifting at the geodetic coordinate level (i.e. lat/longs) will be accomplished using data points generated by the NGS supplied executable geocon.exe. This program accepts input data, and produces output data, in what is known as Blue Book format. In order to facilitate the generation of test data in the form used by the CS-MAP ConsoleTestCpp program, a TcsBlueBook object has been developed and program which produces test point data in this format, and eventually converts the Blue Book data produced by geocon.exe into the TEST.DAT form have also been written.

Generally, the results match at the +/- 2 millimeter level. Precision is difficult to maintain in this testing environment as the Blue Book format only supports five digits of precision at the arc-second level. Thus, verifying results at that low precision value involves getting into trying to make sure ASCII to real and real to ASCII conversions, along with conversions from/to degrees, minutes, and seconds are consistent across the various platforms. Note that the geocon.exe source code is available, but the program is written in FORTRAN, so the code is of little value other than analysis.

Tests generated for inclusion in the standard TEST.DAT file used by ConsoleTestCpp do not test vertical components (a weakness which deserves attention). Thus, testing the vertical component of the NSRS2007/NSRS2011 implementation needs to be done separately.

Implications

The implementation as is currently being tested is maintain in the CS-MAP Subversion repository as a sandbox named NSRS2007.

This implementation leaves all of the precious HARN/?? datum shifts in place. It introduces a new datum named NAD83/HARN. The transformations referenced to this new datum are the only transformations which reference the new 48HPGN data set. End users which have the need to have a specific set of HPGN grid shift files can achieve this by simply placing an entry for the superseding HPGN data file set ahead of the 48HPGN data file set reference in the Geodetic Transformation Dictionary. Thus, a user working in the Four Corners area of the southwest, can specify that, for example, the Arizona datum shift values are to prevail can do so by entering a reference to azhpgn.l?s ahead of the reference to 48hpgn.l?s. Editing the Geodetic Transformation Dictionary is recognized as not being an ideal solution, but given the nature of the beast, it is the best available short of adding the possibility of user interaction required when certain coordinate conversions are selected.

Funding and Resources

The implementation is complete and resides in a CS-MAP Subversion sandbox named NSRS2007. Additional test cases will need to added, and the current ConsoleTestCpp TEST.DAT file does not support vertical testing; thus vertical testing will need to be done separately. Additionally, x64 and Linux testing is yet to be performed.

Funding was provided by Autodesk. Upon approval of this RFC, the proposed changes will be submitted to the trunk.

Data and geodetic information is(was) available at beta.ngs.noaa.gov

Implementation Notes

Implementation of this RFC produced the following items for which additional documentation needs to be provided.

NSRS Datum Shift File Naming Convention

As described below, NSRS datum shifts rely on datum shift data files in the GEOCON format. Definitions of Geodetic Transformations need to specify where such files are located and which specific files apply to individual transformations. Additionally, a geodetic transformation needs to be able to access three such data files. The official data files provided by the National Geodetic Survey (NGS) have the names indicated in the following table:

Region Datum Longitude Shift Latitude Shift Vertical Shift
US 48 States NSRS 2007 dslo.b dsla.b dsv.b
Alaska NSRS 2007 dsloa.b dslaa.b dsva.b
Puerto Rico & VI NSRS 2007 dslop.b dslap.b dsvp.b
US 48 States NSRS 2011 dslo11.b dsla11.b dsv11.b
Alaska NSRS 2011 dsloa11.b dslaa11.b dsva11.b
Puerto Rico & VI NSRS 2011 dslop11.b dslap11.b dsvp11.b

Within the file name, clearly it is the 'o', 'a', and 'v' letters that identify the function of the specific file with respect to the transformation. Thus, a naming convention has been adopted where the file name portion of a grid data reference shall contain a "l?" character sequence. In order to get access to the appropriate data files, the GEOCON transformation constructor will replace the '?' character in the specified file name sequentially with 'o', then 'a', and then replace the complete "l?" sequence with a single 'v' character in order to locate and open the appropriate files. Thus, a file name of "dsl?.b" would be used to reference the three file set: "dslo.b", "dsla.b", and dsv.b". Otherwise, the files can be named anyway developers and/or users want, as long as the Geodetic Transformation dictionary entries adhere to the simple "l?" convention. (If the 'l' character is upper case, the inserted character will also be upper case.)

Typically, these files are maintained in the "Dictionaries/USA/NSRS2007" and "Dictionaries/USA/NSRS2011" directories, but this is not required. They can be located anywhere on the system (or network) where a standard "fopen" can access them.

GEOCON Data File Size

All of these files have a grid density of one minute. Thus, the files are huge, on the order of 20 megabytes. As indicated above, the GEOCON paradigm requires at least two data files (for horizontal transformations), and usually a third (for the vertical component). There are three geographic area covered (with more to come?). Further, there are two separate datums to support. Thus, all tolled, there are approximately 300 megabytes of grid shift data here, and its all binary that does not compress very well. Including 300 megabytes of data for a seldom used (as yet, anyway) feature in distributions and check out downloads is unacceptable. I suggest two alternatives to this problem, a combination of the two could also work. In either case, an RFC depicting the exact solution methodology would need to be presented and approved.

First Alternative

Do not include the data files in the distribution, or even in the Subversion repository. Release documentation should indicate that the distribution does not include these files and provide information on how they can be obtained. The distribution should install the directories which would normally hold these files and place there an extensive README.TXT file with similar information. Upon first access to an NSRS datum, an error condition would lead the user to this information.

Second Alternative

Each of the GEOCON data files is based on a one minute by one minute grid; an incredible density level. Even for entirety of Alaska, the density is one minute by one minute. A program could be written to convert these data files to a density of 3x3 which would reduce the size of the file by almost 90%. The degree of error would be rather small. Users could always replace the lower density files with the real files should they feel the need. Conversions using the less dense files might be a tad faster.

NAD83 to HARN Conversions

Geodetic transformations from NAD83(1986) to HARN require the use of grid data shift files of the "??hpgn.l?s" variety. There are about 45 different sets of these files, usually one set per state (there are variations). When asked to convert from NAD83 to HARN, CS-MAP typically knows which set of files to use as there are some 45 different HARN datums (obviously one datum defined for each set of grid data files). If CS-MAP is asked to convert from NAD83 to NSRS2007, it must first convert the NAD83 coordinates to HARN. But which specific HARN is it to use to do the conversion? Generally, at the geodetic transformation level, there is no knowledge of the geography involved, or the Cartesian system from which the geographic coordinates were generated.

This is complicated by two considerations. First, CS-MAP has no facility to query the end user for information as to which set of HPGN files are to be used. Second, CS_MAP has always been able to convert from any defined coordinate system to any other defined coordinate system without any user interaction.

The solution implemented to address this problem is the development of an aggregate HPGN file set named "48hpgn.l?s". This aggregation was accomplished using the state boundaries contained in the EPSG Parameter Dataset to build a grid with points from the appropriate "??hpgn.l?s" data set. This generally produces the correct results. There are problems due to the fact that the 45 sets of HPGN grid data files all overlap their neighbors to some extent, and the information in the areas of overlap are not the same. Thus, in regions close to the coverage boundaries of the various 45 data sets, the results can be different from what an end user might otherwise expect. This possibility must be documented.

There are three possible work a rounds to this problem. First, end users can use Geodetic Paths to specify the specific HARN datum they want a specific transformation to go through. This implies that Geodetic Paths may need to be changed frequently, and there is the possibility that a necessary change is omitted and erroneous results go undetected.

Second, the Geodetic Transformation definition for the NAD83/HARN datum can be edited so that an original "??hpgn.l?s" data set is given priority over the aggregate HPGN file. Thus, a consulting shop which works primarily in Arizona could modify the geodetic transformation file putting the "azhpgn.l?s " data file ahead of the agggregate data set. Thus, the Arizona data set would always supersede the aggregate data set and the expected results will be obtained.

Third, it could be that we should enable client applications to register a callback function with CS-MAP so that in the event of a situation where CS-MAP needs help deciding what to do, user interaction is possible. In the case of converting from NAD83 to NSRS2007 the callback function would be used to get a selection from the user as to which set of HPGN data files are to be used for the NAD83 to HARN conversion. (This is the approach Brand X is considering.)

Note: See TracWiki for help on using the wiki.