#885 closed enhancement (fixed)
pgsql2shp fields conversion from predefined list
Reported by: | rodo | Owned by: | loic |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 2.0.0 |
Component: | utils/loader-dumper | Version: | master |
Keywords: | history | Cc: |
Description
When using pgsql2shp with table fields name longer than 10 digits pgsql2shp truncate and generate a 10 digits fields name. It will be very useful to add an option on pgsql2shp that define a filename witch contains a 2 fields that associate a 10 digits fields name to the longer fields name. This file could be very simple as :
ADDR:STREE addr:street BOUNDARY_A boundary_admin_level3_name CONTACT_PE contact_person DAMAGE_STA damage_status HFAC_CAPAC hfac_capacity_description
Attachments (4)
Change History (18)
by , 14 years ago
Attachment: | pgsql2shp-conv.txt added |
---|
comment:1 by , 14 years ago
Owner: | changed from | to
---|
I think the remapping occurs at postgis-1.5.1/loader/pgsql2shp.c:1605
/* * make sure the fields all have unique names, */ tmpint=1; for (j=0; j<dbf_nfields; j++) { if (!strncasecmp(field_name, dbf_flds[j], 10)) { sprintf(field_name,"%.7s_%.2d", ptr, tmpint++); j=-1; continue; } }
comment:2 by , 14 years ago
The same section of code is also found in http://svn.osgeo.org/postgis/trunk/loader/pgsql2shp-core.c
/* * make sure the fields all have unique names, */ tmpint = 1; for (j = 0; j < state->fieldcount; j++) { if (!strncasecmp(dbffieldname, state->dbffieldnames[j], 10)) { sprintf(dbffieldname, "%.7s_%.2d", ptr, tmpint++); continue; } }
comment:3 by , 14 years ago
Here is a tentative patch against 1.5, which does not even compile yet. And I'm still unsure where the unit tests should be written.
--- pgsql2shp.c.~1~ 2011-03-26 17:41:44.000000000 +0100 +++ pgsql2shp.c 2011-03-27 00:19:35.700567533 +0100 @@ -65,6 +65,8 @@ int rowbuflen; char temptablename[256]; char *geo_col_name, *table, *shp_file, *schema, *usrquery; +char **geo_map = 0; +int geo_map_size = 0; int type_ary[256]; char *main_scan_query; DBFHandle dbf; @@ -1258,6 +1260,52 @@ return 0; } +#include <sys/stat.h> + +void +read_geo_map(char* file) +{ +#if 0 + struct stat stat_buf; + static char* content = 0; + { + FILE* fp = 0; + if(stat(file, &stat_buf) < 0) { + perror(file); + return; + } + content = malloc(stat_buf.st_size); + fp = fopen(file, 'r'); + if(stat_buf.st_size != fread(content, stat_buf.st_size, 1, fp)) { + free(content); + fprintf(stderr, "fread did not return the expected amount of chars"); + } + fclose(fp); + } + { + int i; + geo_map_size = 0; + + for(i = 0; i < stat_buf.st_size; i++) { + if(content[i] == '\n') { + geo_map_size++; + } + } + geo_map = (char**)malloc(sizeof(char*)*geo_map_size); + { + char** map = geo_map; + *map++ = content; + for(i = 0; i < stat_buf.st_size; i++) { + if(content[i] == '\n' && i + 1 < stat_buf.st_size) + *map++ = content + 1; + if(content[i] == '\n' || content[i] == ' ') + content[i] = '\0'; + } + } + } +#endif +} + void usage(char* me, int status, FILE* out) { @@ -1299,7 +1347,7 @@ memset(buf, 0, 2048); /* just in case... */ /* Parse command line */ - while ((c = pgis_getopt(ARGC, ARGV, "bf:h:du:p:P:g:rk")) != EOF) + while ((c = pgis_getopt(ARGC, ARGV, "bf:h:du:p:P:g:rkm:")) != EOF) { switch (c) { @@ -1332,6 +1380,9 @@ case 'g': geo_col_name = pgis_optarg; break; + case 'm': + read_geo_map(pgis_optarg); + break; case 'k': keep_fieldname_case = 1; break; @@ -1594,9 +1645,19 @@ * becomes __xmin when escaped */ - /* Limit dbf field name to 10-digits */ - strncpy(field_name, ptr, 10); - field_name[10] = 0; + if(geo_map) { + int i; + for(i=0; i<geo_map_size; i++) { + if(!strcasecmp(geo_map, ptr)) { + /* the replacement follows the terminating null */ + strcpy(field_name, geo_map + strlen(geo_map) + 1); + } + } + } else { + /* Limit dbf field name to 10-digits */ + strncpy(field_name, ptr, 10); + field_name[10] = 0; + } /* * make sure the fields all have unique names,
comment:4 by , 14 years ago
We need this feature to have a higher usuability of shapefiles generated from OSM data. By now we generate a conversion list after shape generation, it's a file like that the one you can reach at http://bit.ly/g08BKw. The OSM key will change in future and we are sure to have more keys soon, we want to stabilize the conversion list, like that it'll be easier to work with shapefiles without always search for the signification field name.
comment:5 by , 14 years ago
Milestone: | PostGIS 1.5.3 → PostGIS 2.0.0 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Version: | 1.5.X → trunk |
The proposed patch ( attachment:885-trunk.patch ) implements the feature against changeset:6970 . The corresponding tests can be run with:
./configure --with-gui make check make -C loader/cunit check
Note that --with-gui
and the need to explicitly ask for running the tests in the loader/cunit
are not a behaviour added by the patch.
comment:6 by , 14 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
by , 14 years ago
Attachment: | 885-1.5.patch added |
---|
pgsql2shp fields conversion patch backported to 1.5
comment:7 by , 14 years ago
Isn't this patch missing the "ensure unique name" spot of the code ? Ie: the actually used name might be changed after the call to the mapping routine, or am I missing sometihng ?
comment:8 by , 14 years ago
By "ensure unique name" spot of the code I assume you are refering to loader/pgsql2shp-core.c lines
/* * make sure the fields all have unique names, */ tmpint = 1; for (j = 0; j < state->fieldcount; j++) { if (!strncasecmp(dbffieldname, state->dbffieldnames[j], 10)) { sprintf(dbffieldname, "%.7s_%.2d", ptr, tmpint++); continue; } }
which are left untouched by the patch and occur after the mapping function is called:
- dbffieldname = malloc(11); - strncpy(dbffieldname, ptr, 10); - dbffieldname[10] = '\0'; + dbffieldname = ShpDumperFieldnameLimit(ptr, state);
My reasoning for keeping this part is that if, presumably by mistake, the provided symbol map contains duplicates, enforcing uniqueness will help ensure the consistency of the generated file. If the map does not produce duplicate symbols, the make sure the fields all have unique names
lines will do nothing.
Am I making sense ?
by , 14 years ago
Attachment: | 885-styled-trunk.patch added |
---|
astyle and documented final patch candidate
comment:9 by , 14 years ago
I have reworked the attachment:885-styled-trunk.patch in accordance to the http://trac.osgeo.org/postgis/browser/trunk/STYLE guidelines. I hope you will find it worth of insertion in the trunk
comment:10 by , 14 years ago
Keywords: | history added |
---|
Good work. Committed in r6996. doc/man/pgsql2shp.1 would need a patch too, could you provide that as well ?
comment:11 by , 14 years ago
Thank you for taking time to review this patch. Sorry for overlooking the manual page, here is the patch for doc/man/pgsql2shp.1 :
-
pgsql2shp.1
67 67 \fB\-k\fR 68 68 Keep idendifiers case (don't uppercase field names). 69 69 .TP 70 \fB\-m\fR <\fIfilename\fR> 71 Remap identifiers to ten digit names. The content of the file is lines 72 of two symbols separated by a single white space and no trailing 73 or leading space: 74 75 VERYLONGSYMBOL SHORTONE\\n 76 .br 77 ANOTHERVERYLONGSYMBOL SHORTER\\n 78 79 etc. 80 .TP 70 81 \fB\-?\fR 71 82 Display version and usage information. 72 83
comment:12 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
comment:13 by , 14 years ago
Component: | postgis → loader/dumper |
---|
comment:14 by , 13 years ago
Revised version has been committed to trunk. Please can you verify that it still works as expected?
A sample for what the conv file format should be