Ticket #808 (closed defect: fixed)
shp2pgsql and encoding, something must be wrong
| Reported by: | nicklas | Owned by: | pramsey |
|---|---|---|---|
| Priority: | medium | Milestone: | PostGIS 2.0.0 |
| Component: | postgis | Version: | trunk |
| Keywords: | Cc: |
Description
I think enncoding is very frustrating and hard to understand. It is likely that this is invalid, but I have twisted it around so many times so now I think something is wrong.
I have attached a simple dbf-file with one field called "address" and one row with the text: "Tårneby in Våler i Solør kommune"
if I first try to use shp2pgsql just ignoring the funny letters, like:
nicklas@ubuntu64:~/Documents$ /usr/lib/postgresql/8.4/bin/shp2pgsql test.dbf>test.sql
I get the error message :
Unable to convert data value to UTF-8 (iconv reports "Invalid or incomplete multibyte or wide character"). Current encoding is "UTF-8". Try "LATIN1" (Western European), or one of the values described at http://www.postgresql.org/docs/current/static/multibyte.html.
If I do like this:
nicklas@ubuntu64:~/Documents$ /usr/lib/postgresql/9.0/bin/shp2pgsql -W LATIN1 test.dbf>test.sql
the sql file is produced like this:
SET CLIENT_ENCODING TO UTF8;
SET STANDARD_CONFORMING_STRINGS TO ON;
BEGIN;
CREATE TABLE "test" (gid serial PRIMARY KEY,
"address" varchar(32));
INSERT INTO "test" ("address") VALUES ('Tårneby in Våler I Solør kommune');
COMMIT;
The problem is that psql won't load this sql-file into the database complaining like this:
psql:test.sql:6: ERROR: invalid byte sequence for encoding "UTF8": 0xe5726e
So, what I have to do is changing first row in sql file to client_encoding LATIN1 instead. The everything works.
According to PostGIS doc shp2pgsql is supposed to convert to UTF8 in the sql file so psql can load UTF8. I don't think it works that way. shp2pgsql does nothing about the actual encoding but should tell postgresql about the original encoding.
The behavior of today makes it impossible to use shp2pgsql-gui since there is no way to edit the sql-file. You will get one error or another no matter what encoding you declare.
I have only tried this on trunk version.
What I don't understand is if some local settings in my system makes things different. But I think DEPESZ explanation here makes sense and my experience agrees with it.
Thanks Nicklas

