Opened 7 years ago

Closed 5 years ago

#6960 closed defect (wontfix)

PostgreSQL's numeric type has different definition than Shapefile/DBF

Reported by: Mike Taves Owned by: warmerdam
Priority: normal Milestone: closed_because_of_github_migration
Component: default Version: unspecified
Severity: normal Keywords:
Cc:

Description

I have found that PostgreSQL's NUMERIC(precision, scale) is inconsistent with the shapefile.

For instance, SELECT '122.7'::numeric(4,1); is valid in PostgreSQL, but when copied to a shapefile via CopyLayer, the shapefile has a width 4, precision 1, but raises a warning:

Warning 1: Value 122.7 of field value of feature 0 not successfully written. Possibly due to too larger number with respect to field width

and ultimately the truncated value stored in the .dbf is 122. and in ArcCatalog, the shapefile properties show the field has a precision of 3 and scale of 1. Sure enough, this field can no longer hold 122.7, as it originally did.

To correctly translate the datatype, I suppose the width needs to be increased by one, to include the width of the decimal point (which does not need to be considered by PostgreSQL). Other similar database drivers may also be affected similarly.

This behavior is found with GDAL 2.2.1 on Windows (via Anaconda2 x64), and GDAL 1.11.1 on CentOS 6, Python 2.6, compiled from source. Here's an attempt to reproduce the bug, although it will require editing to connect to an active PostgreSQL database.

# -*- coding: utf-8 -*-
"""
Created on Mon Jul 10 17:06:40 2017

@author: mtoews
"""

import psycopg2
from osgeo import gdal, ogr

connstr = 'dbname=mydb'  # edit this!

print('GDAL/OGR version: {0}'.format(gdal.__version__))

conn = psycopg2.connect(connstr)
curs = conn.cursor()

value = 122.7

curs.execute('''\
drop table if exists foo;
create unlogged table foo(value numeric(4, 1), geom geometry);
insert into foo(value, geom) values(%s, 'POINT(1 2)');
''', (value,))
conn.commit()

curs.execute('select value from foo;')
print('value from PG via psycopg2: {0}'.format(curs.fetchone()[0]))
conn.close()

pg_ds = ogr.Open('PG:' + connstr)
assert pg_ds

q_ly = pg_ds.ExecuteSQL('select value from foo;')
print('value from PG via GDAL/OGR: {0}'.format(q_ly.GetFeature(0)['value']))

ly_name = 'foo'
shp_path = ly_name + '.shp'
ogr_ds = ogr.Open(shp_path, 1)
if ogr_ds is not None:
    ogr_ds.DeleteLayer(ly_name)
else:
    drv = ogr.GetDriverByName('ESRI Shapefile')
    ogr_ds = drv.CreateDataSource(shp_path)

pg_ly = pg_ds.ExecuteSQL('select * from foo;')
ogr_ly = ogr_ds.CopyLayer(pg_ly, ly_name)
ogr_ly.SyncToDisk()
print('value from SHP:{0}'.format(ogr_ly.GetFeature(0)['value']))

ld = ogr_ly.GetLayerDefn()
fd = ld.GetFieldDefn(0)
print('width: {0}, precision: {1}'.format(fd.GetWidth(), fd.GetPrecision()))

ogr_ly = pg_ly = ogr_ds = pg_ds = None

with output:

GDAL/OGR version: 2.2.1
value from PG via psycopg2: 122.7
value from PG via GDAL/OGR: 122.7
Warning 1: Value 122.7 of field value of feature 0 not successfully written. Possibly due to too larger number with respect to field width
value from SHP:122.0
width: 4, precision: 1

Change History (5)

comment:1 by Even Rouault, 7 years ago

See https://issues.qgis.org/issues/15188#note-8 for an analysis of the issue

in reply to:  1 comment:2 by Mike Taves, 7 years ago

Replying to Even Rouault:

See https://issues.qgis.org/issues/15188#note-8 for an analysis of the issue

Yup, that's the same behavior. Also yes, the optional minus sign is another character that contributes to DBF's potential width. (i.e. '-22.7'::numeric(4,1) or value = -22.7 truncates and stores 4 characters in the DBF file, -22.).

Regardless of the software, the DBF field length (see 1. 2 Field Descriptor Array) is different than other definitions of precision/scale used by, e.g. PostgreSQL, Esri and SQL Standard, which disregard the non-digits in the length/width. Shapelib's Attribute (.DBF) API docs seem to get the definitions correct. OGR's Field Definitions of precision and width are directly mapped to DBF's field length and decimal count, which are defined differently.

I suppose one fix would be to translate OFTReal with OGR width > 0 to DBF N type, where:

  • OGR precision == 0: DBF field width = OGR width + 1, for optional sign
  • OGR precision > 0: DBF field width = OGR width + 2, for optional sign and decimal point

A workaround is to inflate the precision in PostgreSQL's numeric datatype before passing to OGR.

comment:3 by Mike Taves, 7 years ago

Yet another consideration is fractional values between -1.0 and 1.0 (exclusive), which have a leading zero. For example, value = -0.012 can be accurately stored with OGR/PostgreSQL using width=3 and precision=3, but is stored in .dbf as -0. and would need to be inflated +3 OGR width to accurately store all six characters -0.012.

I'd update my comment for the second item:

  • OGR precision > 0: DBF field width = OGR width + 3, for optional sign, leading zero, and decimal point

There are no further considerations with exponential values (e.g. -1.2e-12 or -1.2e12), since these values are not stored using exponential notation.

comment:4 by Even Rouault, 7 years ago

We could probably avoid the +3 by just outputing -.012. Should be a valid decimal number

As I raised in the QGIS ticket, I think we should stick with the semantics OGR field width == DBF/MITAB field width == total number of characters to represent the number and OGR field precision == DBF/MITAB field precision == number of decimal figures after decimal point.

So any fix would be on the PostgreSQL driver side (or other SQL providers). My main concern is the lost of round-trippability between OGR and Postgres, since when converting from Postgres to OGR you need to do +2 when converting SQL precision to OGR width, but from OGR to Postgres, you need to do -1 only, since OGR 5.2 can store "12.34", but must be NUMERIC(4,2).

Perhaps we could add a hint at the OGRFeatureDefn level, a WidthPrecisionOriginalModel = DBF or SQL flag, that could be used in SQL->SQL direct conversions to do -2 instead of -1.

comment:5 by Even Rouault, 5 years ago

Milestone: closed_because_of_github_migration
Resolution: wontfix
Status: newclosed

This ticket has been automatically closed because Trac is no longer used for GDAL bug tracking, since the project has migrated to GitHub. If you believe this ticket is still valid, you may file it to https://github.com/OSGeo/gdal/issues if it is not already reported there.

Note: See TracTickets for help on using tickets.