Opened 13 years ago

Closed 13 years ago

#889 closed defect (invalid)

shp2pgsql fail load localized character

Reported by: aperi2007 Owned by:
Priority: high Milestone: PostGIS 2.0.0
Component: management Version: master
Keywords: Cc:

Description

using the experimental version for windows:

POSTGIS="2.0.0SVN" GEOS="3.3.0-CAPI-1.7.0" PROJ="Rel. 4.6.1, 21 August 2008" LIBXML="2.7.6" USE_STATS

the shp2pgsql.exe don't load this shapefile where there is this local characters: "àèìòù".

I try with this settings: SRID:3003 Encoding "UTF-8"

I verify that removing the localized character the shapefile is correctly loaded.

Even on Linux debian version of Linux this shapefile with localized character isn't load.

I attach to this ticket the shapefile.

Attachments (2)

line.zip (1.6 KB ) - added by aperi2007 13 years ago.
sample shapefile don't load from shp2pgsql
line_utf8.rar (1.4 KB ) - added by aperi2007 13 years ago.
line shapefile version UTF8

Download all attachments as: .zip

Change History (5)

by aperi2007, 13 years ago

Attachment: line.zip added

sample shapefile don't load from shp2pgsql

comment:1 by strk, 13 years ago

Did you try passing "-W Latin1" ?

by aperi2007, 13 years ago

Attachment: line_utf8.rar added

line shapefile version UTF8

comment:2 by aperi2007, 13 years ago

hi strk, using Latin1 it work !

But I guess there is something other problem .

Because effectively using the "Latin1" the original shapefile is loaded. But I notice another strange phenomenon.

I have try to convert the line.shp to UTF8 version (using qgis) call it "line_utf8.shp" (add it to this ticket).

I notice that using encoding "UTF8" shp2pgsql load only the "line_utf8.shp" and dont load the "line.shp" reporting no error. I guess an error would be reported. The sql file generate from shp2pgsql is simply stop to the first line of COPY.

The second question is: Is right that shp2pgsql skip the load of a shapefile if encoding is set to UTF8 and the shapefile use a different encoding. Instead if the encoding is set to Latin1 it seem to be always loaded. Latin1 seem to be a special encoding setting.

comment:3 by strk, 13 years ago

Resolution: invalid
Status: newclosed

While UTF8 is a possibly multi-byte encoding, Latin1 (like others) are single byte. For single-byte encodings there's no way to tell if the encoding is correct, as each byte correspond to a character, so at worst you get wrong character. For multi-byte, there can be _wrong_ sequences.

Note: See TracTickets for help on using tickets.