Ticket #885 (closed enhancement: fixed)

Opened 2 years ago

Last modified 2 years ago

pgsql2shp fields conversion from predefined list

Reported by: rodo Owned by: loic
Priority: medium Milestone: PostGIS 2.0.0
Component: loader/dumper Version: trunk
Keywords: history Cc:

Description

When using pgsql2shp with table fields name longer than 10 digits pgsql2shp truncate and generate a 10 digits fields name. It will be very useful to add an option on pgsql2shp that define a filename witch contains a 2 fields that associate a 10 digits fields name to the longer fields name. This file could be very simple as :

ADDR:STREE addr:street BOUNDARY_A boundary_admin_level3_name CONTACT_PE contact_person DAMAGE_STA damage_status HFAC_CAPAC hfac_capacity_description

Attachments

pgsql2shp-conv.txt Download (486 bytes) - added by rodo 2 years ago.
A sample for what the conv file format should be
885-trunk.patch Download (11.1 KB) - added by loic 2 years ago.
implement field name map against trunk
885-1.5.patch Download (3.2 KB) - added by loic 2 years ago.
pgsql2shp fields conversion patch backported to 1.5
885-styled-trunk.patch Download (12.8 KB) - added by loic 2 years ago.
astyle and documented final patch candidate

Change History

Changed 2 years ago by rodo

A sample for what the conv file format should be

Changed 2 years ago by loic

  • owner changed from pramsey to loic

I think the remapping occurs at postgis-1.5.1/loader/pgsql2shp.c:1605

               /*
                 * make sure the fields all have unique names,
                 */
                tmpint=1;
                for (j=0; j<dbf_nfields; j++)
                {
                        if (!strncasecmp(field_name, dbf_flds[j], 10))
                        {
                                sprintf(field_name,"%.7s_%.2d",
                                        ptr,
                                        tmpint++);
                                j=-1;
                                continue;
                        }
                }

Changed 2 years ago by loic

The same section of code is also found in http://svn.osgeo.org/postgis/trunk/loader/pgsql2shp-core.c

               /*
                 * make sure the fields all have unique names,
                 */
                tmpint = 1;
                for (j = 0; j < state->fieldcount; j++)
                {
                        if (!strncasecmp(dbffieldname, state->dbffieldnames[j], 10))
                        {
                                sprintf(dbffieldname, "%.7s_%.2d", ptr, tmpint++);
                                continue;
                        }
                }

Changed 2 years ago by loic

Here is a tentative patch against 1.5, which does not even compile yet. And I'm still unsure where the unit tests should be written.

--- pgsql2shp.c.~1~	2011-03-26 17:41:44.000000000 +0100
+++ pgsql2shp.c	2011-03-27 00:19:35.700567533 +0100
@@ -65,6 +65,8 @@
 int rowbuflen;
 char temptablename[256];
 char *geo_col_name, *table, *shp_file, *schema, *usrquery;
+char **geo_map = 0;
+int geo_map_size = 0;
 int type_ary[256];
 char *main_scan_query;
 DBFHandle dbf;
@@ -1258,6 +1260,52 @@
 	return 0;
 }
 
+#include <sys/stat.h>
+
+void
+read_geo_map(char* file)
+{
+#if 0
+  struct stat stat_buf;
+  static char* content = 0;
+  {
+    FILE* fp = 0;
+    if(stat(file, &stat_buf) < 0) {
+      perror(file);
+      return;
+    }
+    content = malloc(stat_buf.st_size);
+    fp = fopen(file, 'r');
+    if(stat_buf.st_size != fread(content, stat_buf.st_size, 1, fp)) {
+      free(content);
+      fprintf(stderr, "fread did not return the expected amount of chars");
+    }
+    fclose(fp);
+  }
+  {
+    int i;
+    geo_map_size = 0;
+
+    for(i = 0; i < stat_buf.st_size; i++) {
+      if(content[i] == '\n') {
+	geo_map_size++;
+      }
+    }
+    geo_map = (char**)malloc(sizeof(char*)*geo_map_size);
+    {
+      char** map = geo_map;
+      *map++ = content;
+      for(i = 0; i < stat_buf.st_size; i++) {
+	if(content[i] == '\n' && i + 1 < stat_buf.st_size)
+	  *map++ = content + 1;
+	if(content[i] == '\n' || content[i] == ' ')
+	  content[i] = '\0';
+      }
+    }
+  }
+#endif  
+}
+
 void
 usage(char* me, int status, FILE* out)
 {
@@ -1299,7 +1347,7 @@
 	memset(buf, 0, 2048); /* just in case... */
 
 	/* Parse command line */
-	while ((c = pgis_getopt(ARGC, ARGV, "bf:h:du:p:P:g:rk")) != EOF)
+	while ((c = pgis_getopt(ARGC, ARGV, "bf:h:du:p:P:g:rkm:")) != EOF)
 	{
 		switch (c)
 		{
@@ -1332,6 +1380,9 @@
 		case 'g':
 			geo_col_name = pgis_optarg;
 			break;
+		case 'm':
+		    read_geo_map(pgis_optarg);
+			break;
 		case 'k':
 			keep_fieldname_case = 1;
 			break;
@@ -1594,9 +1645,19 @@
 		 * becomes __xmin when escaped
 		 */
 
-		/* Limit dbf field name to 10-digits */
-		strncpy(field_name, ptr, 10);
-		field_name[10] = 0;
+		if(geo_map) {
+		  int i;
+		  for(i=0; i<geo_map_size; i++) {
+		    if(!strcasecmp(geo_map, ptr)) {
+		      /* the replacement follows the terminating null */
+		      strcpy(field_name, geo_map + strlen(geo_map) + 1);
+		    }
+		  }
+		} else {
+		  /* Limit dbf field name to 10-digits */
+		  strncpy(field_name, ptr, 10);
+		  field_name[10] = 0;
+		}
 
 		/*
 		 * make sure the fields all have unique names,

Changed 2 years ago by rodo

We need this feature to have a higher usuability of shapefiles generated from OSM data. By now we generate a conversion list after shape generation, it's a file like that the one you can reach at  http://bit.ly/g08BKw. The OSM key will change in future and we are sure to have more keys soon, we want to stabilize the conversion list, like that it'll be easier to work with shapefiles without always search for the signification field name.

Changed 2 years ago by loic

implement field name map against trunk

Changed 2 years ago by loic

  • status changed from new to closed
  • version changed from 1.5.X to trunk
  • resolution set to fixed
  • milestone changed from PostGIS 1.5.3 to PostGIS 2.0.0

The proposed patch ( attachment:885-trunk.patch Download ) implements the feature against changeset:6970 . The corresponding tests can be run with:

./configure --with-gui
make check
make -C loader/cunit check

Note that --with-gui and the need to explicitly ask for running the tests in the loader/cunit are not a behaviour added by the patch.

Changed 2 years ago by loic

  • status changed from closed to reopened
  • resolution fixed deleted

Changed 2 years ago by loic

pgsql2shp fields conversion patch backported to 1.5

Changed 2 years ago by strk

Isn't this patch missing the "ensure unique name" spot of the code ? Ie: the actually used name might be changed after the call to the mapping routine, or am I missing sometihng ?

Changed 2 years ago by loic

By "ensure unique name" spot of the code I assume you are refering to loader/pgsql2shp-core.c lines

                /*
                 * make sure the fields all have unique names,
                 */
                tmpint = 1;
                for (j = 0; j < state->fieldcount; j++)
                {
                        if (!strncasecmp(dbffieldname, state->dbffieldnames[j], 10))
                        {
                                sprintf(dbffieldname, "%.7s_%.2d", ptr, tmpint++);
                                continue;
                        }
                }

which are left untouched by the patch and occur after the mapping function is called:

-               dbffieldname = malloc(11);
-               strncpy(dbffieldname, ptr, 10);
-               dbffieldname[10] = '\0';
+               dbffieldname = ShpDumperFieldnameLimit(ptr, state);

My reasoning for keeping this part is that if, presumably by mistake, the provided symbol map contains duplicates, enforcing uniqueness will help ensure the consistency of the generated file. If the map does not produce duplicate symbols, the make sure the fields all have unique names lines will do nothing. Am I making sense ?

Changed 2 years ago by loic

astyle and documented final patch candidate

Changed 2 years ago by loic

I have reworked the attachment:885-styled-trunk.patch Download in accordance to the http://trac.osgeo.org/postgis/browser/trunk/STYLE guidelines. I hope you will find it worth of insertion in the trunk :-)

Changed 2 years ago by strk

  • keywords history added

Good work. Committed in r6996. doc/man/pgsql2shp.1 would need a patch too, could you provide that as well ?

Changed 2 years ago by loic

Thank you for taking time to review this patch. Sorry for overlooking the manual page, here is the patch for doc/man/pgsql2shp.1 :

  • pgsql2shp.1

     
    6767\fB\-k\fR 
    6868Keep idendifiers case (don't uppercase field names).  
    6969.TP  
     70\fB\-m\fR <\fIfilename\fR> 
     71Remap identifiers to ten digit names. The content of the file is lines  
     72of two symbols separated by a single white space and no trailing  
     73or leading space: 
     74 
     75VERYLONGSYMBOL SHORTONE\\n 
     76.br 
     77ANOTHERVERYLONGSYMBOL SHORTER\\n 
     78 
     79etc. 
     80.TP  
    7081\fB\-?\fR 
    7182Display version and usage information. 
    7283 

Changed 2 years ago by strk

  • status changed from reopened to closed
  • resolution set to fixed

All done with r7000 and r7001.

Robe wins a beer for the thousand rounding.

Changed 2 years ago by strk

  • component changed from postgis to loader/dumper

Changed 2 years ago by mcayland

Revised version has been committed to trunk. Please can you verify that it still works as expected?

Note: See TracTickets for help on using tickets.