Opened 17 months ago
Last modified 2 months ago
#5408 new defect
regression failure on tiger geocoder pagc_normalize_address
Reported by: | robe | Owned by: | robe |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 3.6.0 |
Component: | pagc_address_parser | Version: | master |
Keywords: | windows | Cc: |
Description
Checking to see what triggered this.
In fixing the postgis_tiger_geocoder regression runnings, I discovered this test is failing
SELECT '#1051d' As ticket, * FROM pagc_normalize_address('212 3rd Ave N Minneapolis, MN 55401');
It returns:
ticket | address | predirabbrev | streetname | streettypeabbrev | postdirabbrev | internal | location | stateabbrev | zip | parsed | zip4 | address_alphanumeric --------+---------+--------------+------------+------------------+---------------+----------+-------------+-------------+---------+--------+------+---------------------- #1051d | 212 | | 3RD | AVE | | | MINNEAPOLIS | MN | N 55401 | t | | 212 (1 row)
So putting the suffix dir in the wrong location. It must have worked at some point in time since I have it as a regression test.
Note the regular tiger normalizer function works fine.
SELECT '#1051d' As ticket, * FROM normalize_address('212 3rd Ave N Minneapolis, MN 55401');
ticket | address | predirabbrev | streetname | streettypeabbrev | postdirabbrev | internal | location | stateabbrev | zip | parsed | zip4 | address_alphanumeric --------+---------+--------------+------------+------------------+---------------+----------+-------------+-------------+-------+--------+------+---------------------- #1051d | 212 | | 3rd | Ave | N | | Minneapolis | MN | 55401 | t | | 212 (1 row)
and if I use the address_standardizer_data_us dataset I also get a screwed up answer.
SELECT * FROM standardize_address('us_lex', 'us_gaz', 'us_rules','212 3rd Ave N Minneapolis, MN 55401')
So whatever it is is wrong in this dataset too:
building | house_num | predir | qual | pretype | name | suftype | sufdir | ruralroute | extra | city | state | country | postcode | box | unit ----------+-----------+--------+------+---------+------+---------+--------+------------+-------+-------------+-----------+---------+----------+-----+------ | 212 | | | | 3 | AVENUE | | | | MINNEAPOLIS | MINNESOTA | USA | N 55401 | | (1 row)
Testing my PostGIS 3.3.2 and that also gives wrong answer
Change History (5)
comment:1 by , 17 months ago
comment:2 by , 17 months ago
I was puzzled why winnie isn't failing this test and now I realize the bad answer is in the test suite. Not sure why I saw it different on mine, or maybe I accidentally committed my version.
At anyrate once I fix this issue, need to correct the test.
comment:3 by , 17 months ago
Given 3.3.2 is also affected this is not really a regression, right ? Can we push to 3.4.0 ?
comment:4 by , 17 months ago
Milestone: | PostGIS 3.3.4 → PostGIS 3.5.0 |
---|
comment:5 by , 2 months ago
Keywords: | windows added |
---|---|
Milestone: | PostGIS 3.5.0 → PostGIS 3.6.0 |
This might have to do with regex function being used.
I see the issue on my windows dev, but not running on demo.postgis.net.
Main difference aside from OS is my dev is still using pcre-1 and demo.postgis.net is using pcre-2. Also not seeing the issue on my PostGIS 3.4.2 PG16 Ubuntu 22.04
Okay maybe this is just a case of too ambiguous.
The 2 arg standardizer works fine:
and if I put the comma in the right spot, it works fine too:
The debug_normalize_address shows the primary issue is at the parse_address level as how it separates the single line into a micro, macro and not with the rules.
Which actually is not bad, I mean it should resolve to North Minneapolis, but I guess since there is no such thing in the gaz table, it doesn't know what to do with that N. Why it shoves that in the zip field is a little concerning.
But testing with the parse address though, the zip is picked up correctly, so something is a bit haywire here