Opened 13 years ago

Closed 7 years ago

Last modified 7 years ago

#886 closed defect (fixed)

Fix normalizing of multi street number range addresses and fractional street numbers

Reported by: robe Owned by: robe
Priority: medium Milestone: PostGIS 2.4.0
Component: tiger geocoder Version: master
Keywords: Cc: woodbri

Description

SELECT normalize_address('123-125 Washington Avenue, Boston MA 02130')

converts the street number 123-125 to 123125.

The best would be to just take the first part 123 or alternatively create a new field in norm_addy to hold the second 125.The second option while more ideal is requires more rework of the logic to be aware of this additional field.

Change History (6)

comment:1 by robe, 13 years ago

Summary: Fix normalizing of multi street number range addressesFix normalizing of multi street number range addresses and fractional street numbers

comment:2 by robe, 13 years ago

refer back to #1068 for more examples

comment:3 by robe, 13 years ago

A number of the errors noted in #1077 of Brian's list are street number normalization which is not quite yet feasible to fix without augmenting norm_addy structure.

My inclination is to just add an extra field called address_extra so that the current address field (which is badly named since it's really street number), would contain the first numeric segment of a street and address_extra would be a text field that would contain the rest of the street number. This would solve street ranges, alpha numeric street issue, and most fractional street cases. The pprint_addy would then be revised to take this new address_extra into consideration when formulating the display.

So for example: 19126-20 ⇒ address: 19126 address_extra: -20 4D ⇒ address: 4, address_extra: D 123 ½ ⇒ adress: 123, address_extra: ½

For geocoding we'll just ignore the address_extra part since in most cases, they don't add any benefit to geocoding.

comment:4 by woodbri, 11 years ago

Cc: woodbri added

This strategy is not bad, but for linear interpolation you need to have a number that is varying over the length of the segment. In your reference set you have the start and end range "numbers" that might actually be complex patterns but by comparing the the ranges you can sort out what is constant and what is changing over the range and extract that. When you only have a single address you can not do this analyses.

So a better global strategy might be to find the segments that match on street and location and them compare trigrams, or levensthein edit distance to score which range is a close fit then do an analysis similar to the above to extract the range over which the segment needs to be interpolated.

comment:5 by robe, 7 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

comment:6 by robe, 7 years ago

Milestone: PostGIS Fund MePostGIS 2.4.0
Resolution: fixed
Status: newclosed

This actually got fixed in 2.4.0 well close enough because Leo kept on nagging me about his street numbers being mangled so. I also had to add a zip4 to preserve his precious zip4 values.

So new output in normalize_address, and pagc_address outputs a new column address_alphanumeric which preserves the original street number. I forget when I change the streetnumber to just take the first part. I think that was before 2.4

SELECT j.key, j.value FROM json_each_text(to_json(normalize_address('123-125 Washington Avenue, Boston MA 02130'))) AS j;

         key          |   value
----------------------+------------
 address              | 123
 predirabbrev         | NULL
 streetname           | Washington
 streettypeabbrev     | Ave
 postdirabbrev        | NULL
 internal             | NULL
 location             | Boston
 stateabbrev          | MA
 zip                  | 02130
 parsed               | true
 zip4                 | NULL
 address_alphanumeric | 123-125
(12 rows)

SELECT j.key, j.value FROM json_each_text(to_json(pagc_normalize_address('123-125 Washington Avenue, Boston MA 02130'))) AS j;

         key          |   value
----------------------+------------
 address              | 123
 predirabbrev         | NULL
 streetname           | WASHINGTON
 streettypeabbrev     | AVE
 postdirabbrev        | NULL
 internal             | NULL
 location             | BOSTON
 stateabbrev          | MA
 zip                  | 02130
 parsed               | true
 zip4                 | NULL
 address_alphanumeric | 123 125
(12 rows)
Last edited 7 years ago by robe (previous) (diff)
Note: See TracTickets for help on using tickets.