Opened 8 years ago

Closed 8 years ago

#3616 closed defect (fixed)

Pull changes to address standardizer from pagc repository

Reported by: woodbri Owned by: robe
Priority: medium Milestone: PostGIS 2.3.0
Component: pagc_address_parser Version: 2.2.x
Keywords: Cc:

Description

I'm not sure how changes to pagc address standardizer are synced, but I just checked in a bug fix in:

http://svn.code.sf.net/p/pagc/code/branches/sew-refactor/postgresql/

  • test_main.c - Added code to pass commandline options
  • tokenize.c - Change to allow a MIXED token to follow an AMPERS token

There was a problem standardizing Canada data the contains a lot of roads like:

123 rang 15e & 16e

The old code failed to standardize the "16e" as a MIXED token because a MIXED token is only allowed to follow a predefined list of tokens that did not include AMPERS. My change adds AMPERS to that list.

I don't think I have commit ability for postgis, but these changes should get pulled into postgis.

Index: tokenize.c
===================================================================
--- tokenize.c  (revision 362)
+++ tokenize.c  (working copy)
@@ -27,7 +27,7 @@
 #include <stddef.h>
 #include "pagc_api.h"

-static SYMB precedes_identifier_list[] = { BOXT , ROAD , UNITH , PRETYP , BUILDH , RR , FAIL } ;
+static SYMB precedes_identifier_list[] = { BOXT , ROAD , AMPERS, UNITH , PRETYP , BUILDH , RR , FAIL } ;
 static SYMB precedes_route_list[] = { TYPE , QUALIF , PROV , FAIL } ;
 #ifdef COMBINE_FRACTS_WITH_NUMBS
 static SYMB FractL[] = { FRACT , FAIL } ;

Index: test_main.c
===================================================================
--- test_main.c (revision 362)
+++ test_main.c (working copy)
@@ -103,6 +103,7 @@
         if (p == q) break;
         p = q;
         nr++;
+        assert(nr < RULESIZE);
         r++;
     }

@@ -111,7 +112,7 @@

 void Usage()
 {
-        printf("Usage: test_main [-o n] \n");
+        printf("Usage: test_main [-o n] lex.txt gaz.txt rules.txt \n");
         printf("       -o n = options bit flag\n");
         printf("          1 = print lexicon\n");
         printf("          2 = print gazeteer\n");
@@ -139,15 +140,18 @@
     int err;
     int cnt;
     int option = 0;
+    char *flex;
+    char *fgaz;
+    char *frules;

     FILE *in;

-    if (argc == 3 && !strcmp(argv[1], "-o")) {
+    if (argc > 3 && !strcmp(argv[1], "-o")) {
         option = strtol(argv[2], NULL, 10);
         argc -= 2;
         argv += 2;
     }
-    else if (argc != 1)
+    else if (argc != 4)
         Usage();

     std = std_init();
@@ -156,7 +160,8 @@
     lex = lex_init(std->err_p);
     assert(lex);

-    in = fopen(LEXIN, "rb");
+    flex = argv[1];
+    in = fopen(flex, "rb");
     assert(in);

     cnt = 0;
@@ -184,7 +189,8 @@
     gaz = lex_init(std->err_p);
     assert(gaz);

-    in = fopen(GAZIN, "rb");
+    fgaz = argv[2];
+    in = fopen(fgaz, "rb");
     assert(in);

     cnt = 0;
@@ -215,7 +221,8 @@

     /* ************ RULES **************** */

-    in = fopen(RULESIN, "rb");
+    frules = argv[3];
+    in = fopen(frules, "rb");
     assert(in);

     cnt = 0;

Change History (4)

comment:1 by robe, 8 years ago

Steve,

You should have commit ability. Let me know if you don't. I recall a year ago when you tried and I thought we were all set.

comment:2 by robe, 8 years ago

Component: postgispagc_address_parser
Owner: changed from pramsey to robe

comment:4 by woodbri, 8 years ago

Resolution: fixed
Status: newclosed

In 15049:

Adding some commandline options to test_main.c for debugging.
Fixed a bug in tokenize.c to allow MIXED token to follow an AMPERS token in the rules.
This should close #3616

Note: See TracTickets for help on using tickets.