Opened 12 years ago
Closed 6 years ago
Last modified 6 years ago
#886 closed defect (fixed)
Fix normalizing of multi street number range addresses and fractional street numbers
|Reported by:||robe||Owned by:||robe|
SELECT normalize_address('123-125 Washington Avenue, Boston MA 02130')
converts the street number 123-125 to 123125.
The best would be to just take the first part 123 or alternatively create a new field in norm_addy to hold the second 125.The second option while more ideal is requires more rework of the logic to be aware of this additional field.
Change History (6)
comment:1 by , 12 years ago
|Summary:||Fix normalizing of multi street number range addresses → Fix normalizing of multi street number range addresses and fractional street numbers|
comment:2 by , 12 years ago
comment:3 by , 12 years ago
A number of the errors noted in #1077 of Brian's list are street number normalization which is not quite yet feasible to fix without augmenting norm_addy structure.
My inclination is to just add an extra field called address_extra so that the current address field (which is badly named since it's really street number), would contain the first numeric segment of a street and address_extra would be a text field that would contain the rest of the street number. This would solve street ranges, alpha numeric street issue, and most fractional street cases. The pprint_addy would then be revised to take this new address_extra into consideration when formulating the display.
So for example: 19126-20 ⇒ address: 19126 address_extra: -20 4D ⇒ address: 4, address_extra: D 123 ½ ⇒ adress: 123, address_extra: ½
For geocoding we'll just ignore the address_extra part since in most cases, they don't add any benefit to geocoding.
comment:4 by , 10 years ago
This strategy is not bad, but for linear interpolation you need to have a number that is varying over the length of the segment. In your reference set you have the start and end range "numbers" that might actually be complex patterns but by comparing the the ranges you can sort out what is constant and what is changing over the range and extract that. When you only have a single address you can not do this analyses.
So a better global strategy might be to find the segments that match on street and location and them compare trigrams, or levensthein edit distance to score which range is a close fit then do an analysis similar to the above to extract the range over which the segment needs to be interpolated.
comment:5 by , 6 years ago
|Milestone:||PostGIS Future → PostGIS Fund Me|
comment:6 by , 6 years ago
|Milestone:||PostGIS Fund Me → PostGIS 2.4.0|
|Status:||new → closed|
This actually got fixed in 2.4.0 well close enough because Leo kept on nagging me about his street numbers being mangled so. I also had to add a zip4 to preserve his precious zip4 values.
So new output in normalize_address, and pagc_address outputs a new column address_alphanumeric which preserves the original street number. I forget when I change the streetnumber to just take the first part. I think that was before 2.4
SELECT j.key, j.value FROM json_each_text(to_json(normalize_address('123-125 Washington Avenue, Boston MA 02130'))) AS j; key | value ----------------------+------------ address | 123 predirabbrev | NULL streetname | Washington streettypeabbrev | Ave postdirabbrev | NULL internal | NULL location | Boston stateabbrev | MA zip | 02130 parsed | true zip4 | NULL address_alphanumeric | 123-125 (12 rows) SELECT j.key, j.value FROM json_each_text(to_json(pagc_normalize_address('123-125 Washington Avenue, Boston MA 02130'))) AS j; key | value ----------------------+------------ address | 123 predirabbrev | NULL streetname | WASHINGTON streettypeabbrev | AVE postdirabbrev | NULL internal | NULL location | BOSTON stateabbrev | MA zip | 02130 parsed | true zip4 | NULL address_alphanumeric | 123 125 (12 rows)
refer back to #1068 for more examples