Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#885 closed enhancement (fixed)

pgsql2shp fields conversion from predefined list

Reported by: rodo Owned by: loic
Priority: medium Milestone: PostGIS 2.0.0
Component: loader/dumper Version: trunk
Keywords: history Cc:

Description

When using pgsql2shp with table fields name longer than 10 digits pgsql2shp truncate and generate a 10 digits fields name. It will be very useful to add an option on pgsql2shp that define a filename witch contains a 2 fields that associate a 10 digits fields name to the longer fields name. This file could be very simple as :

ADDR:STREE addr:street BOUNDARY_A boundary_admin_level3_name CONTACT_PE contact_person DAMAGE_STA damage_status HFAC_CAPAC hfac_capacity_description

Attachments (4)

pgsql2shp-conv.txt (486 bytes) - added by rodo 8 years ago.
A sample for what the conv file format should be
885-trunk.patch (11.1 KB) - added by loic 8 years ago.
implement field name map against trunk
885-1.5.patch (3.2 KB) - added by loic 8 years ago.
pgsql2shp fields conversion patch backported to 1.5
885-styled-trunk.patch (12.8 KB) - added by loic 8 years ago.
astyle and documented final patch candidate

Download all attachments as: .zip

Change History (18)

Changed 8 years ago by rodo

Attachment: pgsql2shp-conv.txt added

A sample for what the conv file format should be

comment:1 Changed 8 years ago by loic

Owner: changed from pramsey to loic

I think the remapping occurs at postgis-1.5.1/loader/pgsql2shp.c:1605

               /*
                 * make sure the fields all have unique names,
                 */
                tmpint=1;
                for (j=0; j<dbf_nfields; j++)
                {
                        if (!strncasecmp(field_name, dbf_flds[j], 10))
                        {
                                sprintf(field_name,"%.7s_%.2d",
                                        ptr,
                                        tmpint++);
                                j=-1;
                                continue;
                        }
                }

comment:2 Changed 8 years ago by loic

The same section of code is also found in http://svn.osgeo.org/postgis/trunk/loader/pgsql2shp-core.c

               /*
                 * make sure the fields all have unique names,
                 */
                tmpint = 1;
                for (j = 0; j < state->fieldcount; j++)
                {
                        if (!strncasecmp(dbffieldname, state->dbffieldnames[j], 10))
                        {
                                sprintf(dbffieldname, "%.7s_%.2d", ptr, tmpint++);
                                continue;
                        }
                }

comment:3 Changed 8 years ago by loic

Here is a tentative patch against 1.5, which does not even compile yet. And I'm still unsure where the unit tests should be written.

--- pgsql2shp.c.~1~	2011-03-26 17:41:44.000000000 +0100
+++ pgsql2shp.c	2011-03-27 00:19:35.700567533 +0100
@@ -65,6 +65,8 @@
 int rowbuflen;
 char temptablename[256];
 char *geo_col_name, *table, *shp_file, *schema, *usrquery;
+char **geo_map = 0;
+int geo_map_size = 0;
 int type_ary[256];
 char *main_scan_query;
 DBFHandle dbf;
@@ -1258,6 +1260,52 @@
 	return 0;
 }
 
+#include <sys/stat.h>
+
+void
+read_geo_map(char* file)
+{
+#if 0
+  struct stat stat_buf;
+  static char* content = 0;
+  {
+    FILE* fp = 0;
+    if(stat(file, &stat_buf) < 0) {
+      perror(file);
+      return;
+    }
+    content = malloc(stat_buf.st_size);
+    fp = fopen(file, 'r');
+    if(stat_buf.st_size != fread(content, stat_buf.st_size, 1, fp)) {
+      free(content);
+      fprintf(stderr, "fread did not return the expected amount of chars");
+    }
+    fclose(fp);
+  }
+  {
+    int i;
+    geo_map_size = 0;
+
+    for(i = 0; i < stat_buf.st_size; i++) {
+      if(content[i] == '\n') {
+	geo_map_size++;
+      }
+    }
+    geo_map = (char**)malloc(sizeof(char*)*geo_map_size);
+    {
+      char** map = geo_map;
+      *map++ = content;
+      for(i = 0; i < stat_buf.st_size; i++) {
+	if(content[i] == '\n' && i + 1 < stat_buf.st_size)
+	  *map++ = content + 1;
+	if(content[i] == '\n' || content[i] == ' ')
+	  content[i] = '\0';
+      }
+    }
+  }
+#endif  
+}
+
 void
 usage(char* me, int status, FILE* out)
 {
@@ -1299,7 +1347,7 @@
 	memset(buf, 0, 2048); /* just in case... */
 
 	/* Parse command line */
-	while ((c = pgis_getopt(ARGC, ARGV, "bf:h:du:p:P:g:rk")) != EOF)
+	while ((c = pgis_getopt(ARGC, ARGV, "bf:h:du:p:P:g:rkm:")) != EOF)
 	{
 		switch (c)
 		{
@@ -1332,6 +1380,9 @@
 		case 'g':
 			geo_col_name = pgis_optarg;
 			break;
+		case 'm':
+		    read_geo_map(pgis_optarg);
+			break;
 		case 'k':
 			keep_fieldname_case = 1;
 			break;
@@ -1594,9 +1645,19 @@
 		 * becomes __xmin when escaped
 		 */
 
-		/* Limit dbf field name to 10-digits */
-		strncpy(field_name, ptr, 10);
-		field_name[10] = 0;
+		if(geo_map) {
+		  int i;
+		  for(i=0; i<geo_map_size; i++) {
+		    if(!strcasecmp(geo_map, ptr)) {
+		      /* the replacement follows the terminating null */
+		      strcpy(field_name, geo_map + strlen(geo_map) + 1);
+		    }
+		  }
+		} else {
+		  /* Limit dbf field name to 10-digits */
+		  strncpy(field_name, ptr, 10);
+		  field_name[10] = 0;
+		}
 
 		/*
 		 * make sure the fields all have unique names,

comment:4 Changed 8 years ago by rodo

We need this feature to have a higher usuability of shapefiles generated from OSM data. By now we generate a conversion list after shape generation, it's a file like that the one you can reach at http://bit.ly/g08BKw. The OSM key will change in future and we are sure to have more keys soon, we want to stabilize the conversion list, like that it'll be easier to work with shapefiles without always search for the signification field name.

Changed 8 years ago by loic

Attachment: 885-trunk.patch added

implement field name map against trunk

comment:5 Changed 8 years ago by loic

Milestone: PostGIS 1.5.3PostGIS 2.0.0
Resolution: fixed
Status: newclosed
Version: 1.5.Xtrunk

The proposed patch ( attachment:885-trunk.patch ) implements the feature against changeset:6970 . The corresponding tests can be run with:

./configure --with-gui
make check
make -C loader/cunit check

Note that --with-gui and the need to explicitly ask for running the tests in the loader/cunit are not a behaviour added by the patch.

comment:6 Changed 8 years ago by loic

Resolution: fixed
Status: closedreopened

Changed 8 years ago by loic

Attachment: 885-1.5.patch added

pgsql2shp fields conversion patch backported to 1.5

comment:7 Changed 8 years ago by strk

Isn't this patch missing the "ensure unique name" spot of the code ? Ie: the actually used name might be changed after the call to the mapping routine, or am I missing sometihng ?

comment:8 Changed 8 years ago by loic

By "ensure unique name" spot of the code I assume you are refering to loader/pgsql2shp-core.c lines

                /*
                 * make sure the fields all have unique names,
                 */
                tmpint = 1;
                for (j = 0; j < state->fieldcount; j++)
                {
                        if (!strncasecmp(dbffieldname, state->dbffieldnames[j], 10))
                        {
                                sprintf(dbffieldname, "%.7s_%.2d", ptr, tmpint++);
                                continue;
                        }
                }

which are left untouched by the patch and occur after the mapping function is called:

-               dbffieldname = malloc(11);
-               strncpy(dbffieldname, ptr, 10);
-               dbffieldname[10] = '\0';
+               dbffieldname = ShpDumperFieldnameLimit(ptr, state);

My reasoning for keeping this part is that if, presumably by mistake, the provided symbol map contains duplicates, enforcing uniqueness will help ensure the consistency of the generated file. If the map does not produce duplicate symbols, the make sure the fields all have unique names lines will do nothing. Am I making sense ?

Changed 8 years ago by loic

Attachment: 885-styled-trunk.patch added

astyle and documented final patch candidate

comment:9 Changed 8 years ago by loic

I have reworked the attachment:885-styled-trunk.patch in accordance to the http://trac.osgeo.org/postgis/browser/trunk/STYLE guidelines. I hope you will find it worth of insertion in the trunk :-)

comment:10 Changed 8 years ago by strk

Keywords: history added

Good work. Committed in r6996. doc/man/pgsql2shp.1 would need a patch too, could you provide that as well ?

comment:11 Changed 8 years ago by loic

Thank you for taking time to review this patch. Sorry for overlooking the manual page, here is the patch for doc/man/pgsql2shp.1 :

  • pgsql2shp.1

     
    6767\fB\-k\fR
    6868Keep idendifiers case (don't uppercase field names).
    6969.TP
     70\fB\-m\fR <\fIfilename\fR>
     71Remap identifiers to ten digit names. The content of the file is lines
     72of two symbols separated by a single white space and no trailing
     73or leading space:
     74
     75VERYLONGSYMBOL SHORTONE\\n
     76.br
     77ANOTHERVERYLONGSYMBOL SHORTER\\n
     78
     79etc.
     80.TP
    7081\fB\-?\fR
    7182Display version and usage information.
    7283

comment:12 Changed 8 years ago by strk

Resolution: fixed
Status: reopenedclosed

All done with r7000 and r7001.

Robe wins a beer for the thousand rounding.

comment:13 Changed 8 years ago by strk

Component: postgisloader/dumper

comment:14 Changed 8 years ago by mcayland

Revised version has been committed to trunk. Please can you verify that it still works as expected?

Note: See TracTickets for help on using tickets.