Opened 21 years ago

Closed 9 years ago

#372 closed defect (fixed)

OGR/OGDI: All (?) string column field lengths shortened by 1 character

Reported by: Markus Neteler Owned by: warmerdam
Priority: high Milestone:
Component: OGR_SF Version: unspecified
Severity: normal Keywords:
Cc: krbaker@…, Mateusz Łoskot

Description (last modified by Mateusz Łoskot)

Frank,

after digging around with the extraction of political boundaries from VMAP0, I discovered, that the column length are not correct.

id: Integer (10.0)
f_code: String (5.0)
nam: String (64.0)
na2: String (2.0)
na3: String (1.0)
tile_id: Integer (5.0)
fac_id: Integer (10.0)

At least the column 'na2' is one character too short.

This works however (extraction of Botswana):

ogrinfo -ro -where "na2 = ' BC'"
gltp:/vrf/grass0/warmerdam/v0soa/vmaplv0/soamafr 'polbnda@bnd(*)_area'
INFO: Open of `gltp:/vrf/grass0/warmerdam/v0soa/vmaplv0/soamafr'
using driver `OGDI' successful.

Layer name: polbnda@bnd(*)_area
Geometry: Polygon
Feature Count: 10
Layer SRS WKT:
GEOGCS["NAD83",
[...]

while this fails:

ogrinfo -ro -where "na2 = ' B'" gltp:/vrf/grass0/warmerdam/v0soa/vmaplv0/soamafr
'polbnda@bnd(*)_area'
INFO: Open of `gltp:/vrf/grass0/warmerdam/v0soa/vmaplv0/soamafr'
using driver `OGDI' successful.

Layer name: polbnda@bnd(*)_area
Geometry: Polygon
Feature Count: 0
Layer SRS WKT:
[...]

It should be a 3 chars string just to be able to store UNK = No entry present

When exporting to SHAPE, and then using ARCExplorer, one character in the 'na2' column is lost.

To fetch the USA from NOAMER, I have to run

ogrinfo -ro -where "na2 = ' US'" 
gltp:/vrf/grass0/warmerdam/v0noa/vmaplv0/noamer 'polbnda@bnd(*)_area'

The bug may also be related to the fact, that the columns f_code, nam, na2, na3 start with a white space. Maybe the reading of the string is shifted by one character.

Hope it is easy to fix.

Thanks,

Markus

Change History (11)

comment:1 by neteler@…, 21 years ago

Frank,

a qick hack around the problem of strings is this:

RCS file: /cvsroot/osrs/gdal/ogr/ogrsf_frmts/ogdi/ogrogdilayer.cpp,v
retrieving revision 1.6
diff -u -r1.6 ogrogdilayer.cpp
--- ogrogdilayer.cpp    21 May 2003 03:58:49 -0000      1.6
+++ ogrogdilayer.cpp    8 Aug 2003 13:19:58 -0000
@@ -508,7 +508,7 @@
           default:
             oField.SetType( OFTString );
             if( oaf->oa.oa_val[i].lenght > 0 )
-                oField.SetWidth( oaf->oa.oa_val[i].lenght );
+                oField.SetWidth( oaf->oa.oa_val[i].lenght + 1);
             else
                 oField.SetWidth( 64 );
             break;

Then I can see the complete country identifier. However, the real
problem will be the first white space character in the strings which
shifts the string to the right by one character and get's it cut.

Best regards

 Markus
(still no C++ programmer... :-( )

comment:2 by krbaker@…, 21 years ago

I was asked to append to this problem with my issues.  First my comments on what
is below.

The fields look the right length...  I think "unk" is the representation of some
sort of null...  The space at the beginning of the string seems like the real
problem.  Though I don't know what UNK should be in the translation...  (I could
be out to lunch on this)  IIRC the VPF file defines the field sizes in some
header...  its not like it goes parsing through...  if we really want to keep
the unk you might have to check if its less than three and set it to three... 
just a thought....


So I first tried this:
> > ogr2ogr -f "PostgreSQL" PG:dbname=geoDB \
> > gltp:/vrf/mapdata/vmaplv0/v0eur/vmaplv0/eurnasia \
> > -nln political_bondries 'polbnda@bnd(*)_area'

and it failed on 
> > ERROR 1: INSERT command for new feature failed.
> > ERROR:  value too long for type character(5)
> > 
> > ERROR 1: Terminating translation prematurely after failed
> > translation of layer polbnda@bnd(*)_area

1st try with debugging on yeilded:

OGR_PG: PQexec(SELECT
AddGeometryColumn('gazgeoDB','first_order','wkb_geometry',104306,'POLYGON',3))
OGR_PG: PQexec(INSERT INTO "first_order" (wkb_geometry , "id", "f_code",
"nam", "na2", "na3", "tile_id", "fac_id") VALUES
(GeometryFromText('POLYGON ((-179.12876892 71.60120392,-179.19979858
71.60615540,-179.32174683 71.60820770,-179.32791138
71.60778046,-179.34704590 71.60645294,-179.35140991
71.60566711,-179.36265564 71.60363007,-179.39033508
71.59860992,-179.40625000 71.59832764,-179.38592529
71.60366821,-179.35977173 71.60800934,-179.29605103
71.61209106,-179.13288879 71.60758209,-179.12300110
71.60483551,-179.12876892 71.60120392))'::TEXT,104306) , 1, ' FA001', '
OSTROV NAKHODKA#CHUKOTSKIY AVTONOMNYY OKRUG', ' RS', ' A', 1, 32))

ERROR 1: INSERT command for new feature failed.
ERROR:  value too long for type character(5)

this was the table that created:
    Column    |     Type      |                         
Modifiers                          
--------------+---------------+-------------------------------------------------------------
 ogc_fid      | integer       | not null default
nextval('"first_order_ogc_fid_seq"'::text)
 wkb_geometry | geometry      | 
 id           | numeric(10,0) | 
 f_code       | character(5)  | 
 nam          | character(64) | 
 na2          | character(2)  | 
 na3          | character(1)  | 
 tile_id      | numeric(5,0)  | 
 fac_id       | numeric(10,0) | 


then I defined my own table and changed the length of f_code to 6 and it
worked failed on the na2 field which has the same problem (the leading
whitespace) I then noticed that all of the text fields have this
problem...  This is the only solution I have come up with...  Load into a custom
built table (with fields that are 1 bigger than they should be) and then munge
out all of the leading spaces.  but thats a pain.

Even more pain...  I almost got it working.  I dumped the vpf to
shapefile with ogr2ogr and then loaded that into a postgres DB.  This
worked, but I just noticed that it hacked off one char in the na2 and
na3 columns in the process of going to shapefile.  I think this points
to the OGDI interface or OGDI's VPF driver.  I'll try and poke at it but
I am not up on the code at all.

Frank asked that I update my code to the latest and greatest as he thought he
had fixed the problem.

I got both CVS trees for OGDI and GDAL and I still have the same
problem.

My gdal tree is at least 7/16/03 (last changelog)
and my OGDI tree is at least 1/29/03 (last changelog)

the OGDI is from the source forge cvs tree and GDAL is from remote
sensing.

comment:3 by neteler@…, 21 years ago

Keith,

the UNK will be simply "unknown". In fact the leading space character is
the problem. The UNK was just an example. The main problem is
that the strings are cut, so that e.g. the countries in the map
polbnda@bnd(*)_area such as 
'US' are cut to ' U'
                 ^-- shifting space char inserted by OGR/OGDI driver

which makes queries impossible. All strings should be shifted
one position to left, then the problem will be fixed.
Probably in a loop the index is 1 and not 0 or so (guessing).

Comparison 'ogrinfo'/'ogdi_info':

ogdi_info -u gltp:/vrf/grass0/warmerdam/v0noa/vmaplv0/noamer  -l
'polbnda@bnd(*)_area' -f area -id 123
[...]
Object ID:123
Object Attributes:124 { FA001 } { NORTHWEST TERRITORIES } { CA } { N } 2 12
0: (0.000000 , 0.000000)
-> CA = Canada

Now we do the same with ogr2ogr:
ogrinfo -ro gltp:/vrf/grass0/warmerdam/v0noa/vmaplv0/noamer
'polbnda@bnd(*)_area' | less
 -> now search for 'NORTHWEST TERRITORIES'

Do you get 'CA' for na2?

I am using yesterday's CVS of OGR and OGDI from End of May.

Best regards

 Markus

comment:4 by warmerdam, 21 years ago

Guys, 

I just want to confirm that I have reproduced the problem of the extra space
with the NOAMER dataset this morning, and will dig into it today.

Sorry for all the delays.


comment:5 by warmerdam, 21 years ago

I have patched gdal/ogr/ogrsf_frmts/ogdi/ogrogdilayer.cpp to trim all leading
white space.  VRF objects where being returned from OGDI with attribute
lists like this:

Object Attributes:1 { FA001 } { OSTROV NAKHODKA#CHUKOTSKIY AVTONOMNYY OKRUG } { 
RS } { A } 1 32

The ecs_FindElement() function used to parse the attribute lists was preserving
the spaces around the attribute values even though, as far as I know, they are
not part of the proper attribute valuein VRF. 

While I suspect the VRF driver should not have been returning the extra
spaces in the first place, I am hesitant to fiddle with that code without
being more sure. 

In any event, the implemented change resolves the problems seen.

The change is in CVS.

comment:6 by krbaker@…, 21 years ago

Fixed the north american vpf data set... but Europe still has issues.  Here is
what I am seeing.  Not 100% sure why this is happening...  maybe some of the
char's in the text string aren't really counted as a true char in vpf?  


krbaker@mosor:~$ CPL_DEBUG=ON /usr/bin/ogr2ogr -f PostgreSQL -nln vpfeur
PG:dbname=gazgeoDB
gltp:/vrf/afs/metacarta.com/map/public/vmaplv0/vpf/v0eur/vmaplv0/eurnasia
'polbnda@bnd(*)_area'
OGR: ~OGRDDataSource(0x804e060)
OGR: ~OGRDDataSource(0x804e060)
OGR: ~OGRDDataSource(0x804e060)
OGR: ~OGRDDataSource(0x804e060)
OGR: ~OGRDDataSource(0x804e060)
OGR: ~OGRDDataSource(0x804e060)
OGR: ~OGRDDataSource(0x804e060)
OGR: ~OGRDDataSource(0x804e060)
OGR: ~OGRDDataSource(0x804e060)
OGR: ~OGRDDataSource(0x804e060)
OGR: OGROpen(gltp:/vrf/afs/metacarta.com/map/public/vmaplv0/vpf/v0eur/vmaplv0/eu
rnasia) succeeded (0x804e060).
OGR_PG: DBName="gazgeoDB"
OGR_PG: PQexec(CREATE TABLE "vpfeur" ( OGC_FID SERIAL ))
OGR_PG_NOTICE: NOTICE:  CREATE TABLE will create implicit sequence 'vpfeur_ogc_f
id_seq' for SERIAL column 'vpfeur.ogc_fid'

OGR_PG_NOTICE: NOTICE:  CREATE TABLE / UNIQUE will create implicit index 'vpfeur
_ogc_fid_key' for table 'vpfeur'

OGR_PG: PQexec(SELECT AddGeometryColumn('gazgeoDB','vpfeur','wkb_geometry',10430
6,'POLYGON',3))
OGR_PG: PQexec(INSERT INTO "vpfeur" (wkb_geometry , "id", "f_code", "nam", "na2"
, "na3", "tile_id", "fac_id") VALUES (GeometryFromText('POLYGON ((83.61547089 74
.05305481,83.61058044 74.05364227,83.60916901 74.05919647,83.60236359 74.0624694
8,83.58677673 74.06372070,83.57983398 74.06427765,83.56016541 74.06913757,83.562
11090 74.07141876,83.57908630 74.07250214,83.59864044 74.06858063,83.60969543 74
.07038879,83.61058044 74.07875061,83.59558105 74.08491516,83.57338715 74.0905532
8,83.55777740 74.08980560,83.54805756 74.08666992,83.55371857 74.07788849,83.549
61395 74.07519531,83.53080750 74.07252502,83.49994659 74.07653046,83.47408295 74
.07736206,83.42522430 74.07375336,83.40255737 74.06983185,83.38527679 74.0649185
2,83.38138580 74.06114197,83.36988831 74.04994202,83.37863922 74.04900360,83.418
55621 74.05449677,83.43777466 74.06005859,83.45211029 74.05452728,83.44916534 74
.04907990,83.44255829 74.04714203,83.39152527 74.04241943,83.36538696 74.0427246
1,83.32577515 74.04766846,83.30449677 74.04738617,83.28072357 74.04955292,83.188
46893 74.06461334,83.17028046 74.06758118,83.14008331 74.07491302,83.09447479 74
.08122253,83.05180359 74.08396912,83.02561188 74.08258057,82.99508667 74.0780029
3,82.94633484 74.06383514,82.89636230 74.06047058,82.88488770 74.06314087,82.904
33502 74.06994629,82.93386078 74.06900024,82.96069336 74.07711029,82.98872375 74
.08555603,83.06880188 74.10530853,83.19869232 74.12564087,83.24594116 74.1300811
8,83.34505463 74.13055420,83.35436249 74.13008118,83.43444824 74.12616730,83.493
63708 74.12083435,83.56138611 74.10717010,83.63122559 74.07963562,83.63752747 74
.07350159,83.63697052 74.06397247,83.62777710 74.05449677,83.61547089 74.0530548
1))'::TEXT,104306) , 2160, 'FA001', 'OSTROV VOSTOCHNYY KAMENNYY#TAYMYRSKIY(DOLGA
NO-NENETSKIY) AVT.OKRUG', 'RS', 'A', 4, 128))

ERROR 1: INSERT command for new feature failed.
ERROR:  value too long for type character(64)

OGR_PG: SoftRollback() with no transaction active.
ERROR 1: Terminating translation prematurely after failed
translation of layer polbnda@bnd(*)_area


comment:7 by warmerdam, 21 years ago

Hmm.

Indeed "OSTROV VOSTOCHNYY KAMENNYY#TAYMYRSKIY(DOLGANO-NENETSKIY) AVT.OKRUG"
is more than 64 characters, but I don't know why.  I think the best fix for
this might be to have the PostGIS driver truncate and warn while constructing the
insert command rather than leaving it to Postgres to fail.  

I *suspect* this is some sort of issue with multi-byte characters, but none
of the above string looks like a multi-byte escape sequence or anything like that.

Does this approach sound OK to you?



comment:9 by Mateusz Łoskot, 17 years ago

Description: modified (diff)

comment:10 by Mateusz Łoskot, 16 years ago

Cc: Mateusz Łoskot added

comment:11 by Markus Neteler, 16 years ago

Reporter: changed from neteler@… to Markus Neteler

comment:12 by Jukka Rahkonen, 9 years ago

Resolution: fixed
Status: reopenedclosed

With a pretty old GDAL 1.7.0b2, FWTools 2.4.7, released 2010/01/19 I get correct result if I query na2 column as 'BC'. No need to write it as ' BC' (which finds zero results). It looks like the core issue of this ticket is fixed long time ago.

ogrinfo  -where "na2 = 'BC'" gltp:/vrf/C:/temp/v0soa_5/vmaplv0/soamafr "polbnda@bnd(*)_area"
Layer name: polbnda@bnd(*)_area
Geometry: Polygon
Feature Count: 10
Extent: (19.999535, -26.907246) - (29.360781, -17.780813)
Layer SRS WKT:
GEOGCS["NAD83",
    DATUM["North_American_Datum_1983",
        SPHEROID["GRS 1980",6378137,298.257222101,
            AUTHORITY["EPSG","7019"]],
        TOWGS84[0,0,0,0,0,0,0],
        AUTHORITY["EPSG","6269"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9108"]],
    AUTHORITY["EPSG","4269"]]
id: Integer (10.0)
f_code: String (5.0)
nam: String (64.0)
na2: String (2.0)
na3: String (1.0)
tile_id: Integer (5.0)
fac_id: Integer (10.0)
OGRFeature(polbnda@bnd(*)_area):2159
  id (Integer) = 2160
  f_code (String) = FA001
  nam (String) = CHOBE
  na2 (String) = BC
  na3 (String) = F
  tile_id (Integer) = 31
  fac_id (Integer) = 16
Note: See TracTickets for help on using tickets.