Opened 17 months ago

Last modified 2 months ago

#5408 new defect

regression failure on tiger geocoder pagc_normalize_address

Reported by: robe Owned by: robe
Priority: medium Milestone: PostGIS 3.6.0
Component: pagc_address_parser Version: master
Keywords: windows Cc:

Description

Checking to see what triggered this.

In fixing the postgis_tiger_geocoder regression runnings, I discovered this test is failing

SELECT '#1051d' As ticket, * FROM pagc_normalize_address('212 3rd Ave N Minneapolis, MN 55401');

It returns:

 ticket | address | predirabbrev | streetname | streettypeabbrev | postdirabbrev | internal |  location   | stateabbrev |   zip   | parsed | zip4 | address_alphanumeric
--------+---------+--------------+------------+------------------+---------------+----------+-------------+-------------+---------+--------+------+----------------------
 #1051d |     212 |              | 3RD        | AVE              |               |          | MINNEAPOLIS | MN          | N 55401 | t      |      | 212
(1 row)

So putting the suffix dir in the wrong location. It must have worked at some point in time since I have it as a regression test.

Note the regular tiger normalizer function works fine.

SELECT '#1051d' As ticket, * FROM normalize_address('212 3rd Ave N Minneapolis, MN 55401');
ticket | address | predirabbrev | streetname | streettypeabbrev | postdirabbrev | internal |  location   | stateabbrev |  zip  | parsed | zip4 | address_alphanumeric
--------+---------+--------------+------------+------------------+---------------+----------+-------------+-------------+-------+--------+------+----------------------
 #1051d |     212 |              | 3rd        | Ave              | N             |          | Minneapolis | MN          | 55401 | t      |      | 212
(1 row)

and if I use the address_standardizer_data_us dataset I also get a screwed up answer.

SELECT *
FROM standardize_address('us_lex',  'us_gaz', 'us_rules','212 3rd Ave N Minneapolis, MN 55401')

So whatever it is is wrong in this dataset too:

building | house_num | predir | qual | pretype | name | suftype | sufdir | ruralroute | extra |    city     |   state   | country | postcode | box | unit
----------+-----------+--------+------+---------+------+---------+--------+------------+-------+-------------+-----------+---------+----------+-----+------
          | 212       |        |      |         | 3    | AVENUE  |        |            |       | MINNEAPOLIS | MINNESOTA | USA     | N 55401  |     |
(1 row)

Testing my PostGIS 3.3.2 and that also gives wrong answer

Change History (5)

comment:1 by robe, 17 months ago

Okay maybe this is just a case of too ambiguous.

The 2 arg standardizer works fine:

SELECT *
FROM standardize_address('us_lex',  'us_gaz', 'us_rules','212 3rd Ave N', 'Minneapolis, MN 55401');
building | house_num | predir | qual | pretype | name | suftype | sufdir | ruralroute | extra |    city     |   state   | country | postcode | box | unit
----------+-----------+--------+------+---------+------+---------+--------+------------+-------+-------------+-----------+---------+----------+-----+------
          | 212       |        |      |         | 3    | AVENUE  | NORTH  |            |       | MINNEAPOLIS | MINNESOTA |         | 55401    |     |
(1 row)

and if I put the comma in the right spot, it works fine too:

SELECT *
FROM standardize_address('us_lex',  'us_gaz', 'us_rules','212 3rd Ave N, Minneapolis, MN 55401');
 building | house_num | predir | qual | pretype | name | suftype | sufdir | ruralroute | extra |    city     |   state   | country | postcode | box | unit
----------+-----------+--------+------+---------+------+---------+--------+------------+-------+-------------+-----------+---------+----------+-----+------
          | 212       |        |      |         | 3    | AVENUE  | NORTH  |            |       | MINNEAPOLIS | MINNESOTA | USA     | 55401    |     |
(1 row)

The debug_normalize_address shows the primary issue is at the parse_address level as how it separates the single line into a micro, macro and not with the rules.

SELECT d->>'micro' AS micro, d->>'macro' AS macro
FROM jsonb(debug_standardize_address('us_lex',  'us_gaz', 'us_rules','212 3rd Ave N Minneapolis, MN 55401')) AS d;
   micro    |           macro
-------------+----------------------------
 212 3rd Ave | N Minneapolis,MN,55401,US,
(1 row)

Which actually is not bad, I mean it should resolve to North Minneapolis, but I guess since there is no such thing in the gaz table, it doesn't know what to do with that N. Why it shoves that in the zip field is a little concerning.

But testing with the parse address though, the zip is picked up correctly, so something is a bit haywire here

SELECT *
FROM parse_address('212 3rd Ave N Minneapolis, MN 55401');
num | street  | street2 |  address1   |     city      | state |  zip  | zipplus | country
-----+---------+---------+-------------+---------------+-------+-------+---------+---------
 212 | 3rd Ave |         | 212 3rd Ave | N Minneapolis | MN    | 55401 |         | US
(1 row)

comment:2 by robe, 17 months ago

I was puzzled why winnie isn't failing this test and now I realize the bad answer is in the test suite. Not sure why I saw it different on mine, or maybe I accidentally committed my version.

At anyrate once I fix this issue, need to correct the test.

comment:3 by strk, 17 months ago

Given 3.3.2 is also affected this is not really a regression, right ? Can we push to 3.4.0 ?

comment:4 by robe, 17 months ago

Milestone: PostGIS 3.3.4PostGIS 3.5.0

comment:5 by robe, 2 months ago

Keywords: windows added
Milestone: PostGIS 3.5.0PostGIS 3.6.0

This might have to do with regex function being used.

I see the issue on my windows dev, but not running on demo.postgis.net.

Main difference aside from OS is my dev is still using pcre-1 and demo.postgis.net is using pcre-2. Also not seeing the issue on my PostGIS 3.4.2 PG16 Ubuntu 22.04

Note: See TracTickets for help on using tickets.