Opened 13 years ago

Closed 7 years ago

#1110 closed defect (wontfix)

Saint Paul vs. St. Paul

Reported by: mikepease Owned by: robe
Priority: high Milestone: PostGIS Fund Me
Component: tiger geocoder Version: 1.5.X
Keywords: Cc: woodbri, mattybe

Description

Is there a way that the normalizer could look for alternate spellings of cities? For instance, several major cities in Minnesota are "Saint" something. St. Paul, St. Cloud, etc. The official spelling in the database is St. Paul, but addresses often spell out the full word "Saint Paul".

—works great[[BR]]

select * from geocode('101 7th St, St. Paul, MN')

—takes 2 minutes and fails to find the location
select * from geocode('101 7th St, Saint Paul, MN')

—works fine if the zip code is included
select * from geocode('101 7th St, Saint Paul, MN 55102')

I'm not sure if there are other words like "Saint" that have alternate legitimate spellings for a city name. Can you think of any?

Change History (14)

comment:1 by darkblueb, 13 years ago

I have 150,000 USPS CASS test adressess in a database, and have slowly been looking at what works and what doesnt

(as pointed to by a previous email)

http://www.usps.com/ncsc/addressservices/certprograms/cass.htm

I hope to have a thorough response to what is working and what is not working, in the next couple of days

Anyone else who wants this to try out, also, is welcome to it…

comment:2 by robe, 13 years ago

Cool Brian. Even with the few tiger regression tests we have so far, it's amazing how many things you can see break with slightly invasive surgery. The more tests the better.

comment:3 by robe, 13 years ago

I think the best way to handle this is probably to have an alternate city name or somthing like that table. Similar to what I think featnames is designed for.

comment:4 by robe, 13 years ago

or we can start incorporating fulltext search engine that is built into postgresql and define a custom dictionary

comment:5 by mcayland, 13 years ago

+1. Now the minimum version of PostgreSQL is high enough, you really should start looking in the fulltext with custom dictionary direction.

comment:6 by robe, 13 years ago

Milestone: PostGIS 2.0.0PostGIS Future

comment:7 by woodbri, 11 years ago

Cc: woodbri added

comment:8 by woodbri, 11 years ago

Another way of doing this is to create a gazetteer that maps the USPS city names to the preferred city name then standardize the city name into the preferred city name. There are about 40K unique city names associated with zipcodes but about 7K city names have multiple standardizations based on state. For example AARON(GA) → PORTAL and AARON(KY) → ALBANY. This would not be a problem is the standardization were all done in the context of the state.

comment:9 by robe, 10 years ago

Refer to #2694 for more examples. I know people are really bugged by this but haven't found an easy all encompassing solution for this without resorting to Steve's solution of normalize both tiger addresses and input addresses. Probably won't have cycles for such a change for a while.

comment:10 by mattybe, 9 years ago

Cc: mattybe added

Just to pile on some more, there are some pretty egregious city "names". If you normalize an address in Indianapolis, the city you get back is "Indianapolis city (balance)". Honolulu becomes "Urban Honolulu" or "East Honolulu", but CASS only gives "Honolulu"

# select distinct stusps, place from zip_state_loc where place like '%balance%';

stusps | place


GA | Augusta-Richmond County consolidated government (balance) CT | Milford city (balance) KS | Greeley County unified government (balance) GA | Athens-Clarke County unified government (balance) IN | Indianapolis city (balance) MT | Butte-Silver Bow (balance) TN | Nashville-Davidson metropolitan government (balance) KY | Louisville/Jefferson County metro government (balance)

comment:11 by woodbri, 9 years ago

This data is just Tiger data, the Census uses legal names of places NOT postal names of places. You can use the gazetteer, remap these names into postal names or vica versa but the real solution to this involves major changes (possible rewrite) of the geocoder so that the incoming Tiger data get standardized using the same rules as the incoming geocode request. The problem is that we do not standardize the Tiger data but we do standardize the incoming request. If you want to match you need to standardize both.

I already do this in my geocoder @ imaptools.com and Regina and I plan to do something similar for Tiger Geocoder 2.0 if we every get funding or time to work on it.

comment:12 by mattybe, 9 years ago

Thanks, Steve. I also found a workaround following the advice you gave here: http://lists.osgeo.org/pipermail/postgis-users/2014-April/039096.html

spatial=# select (addy).*, g.rating from geocode('200 East Washington Street, Indianapolis, IN 46204') as g;
 address | predirabbrev | streetname | streettypeabbrev | postdirabbrev | internal |          location           | stateabbrev |  zip  | parsed | rating
---------+--------------+------------+------------------+---------------+----------+-----------------------------+-------------+-------+--------+--------
     200 | E            | Washington | St               |               |          | Indianapolis city (balance) | IN          | 46204 | t      |     15
     300 | E            | Washington | St               |               |          | Indianapolis city (balance) | IN          | 46202 | t      |     24

spatial=# CREATE TABLE tiger_data.IN_place_add(CONSTRAINT pk_IN_place_add PRIMARY KEY (plcidfp) , CONSTRAINT chk_statefp CHECK (statefp::text = '18'::text)) INHERITS(place);
spatial=# insert into tiger_data.IN_place_add select * from place where name like 'Indianapolis%';
spatial=# update tiger_data.IN_place_add set name='Indianapolis', namelsad='Indianapolis';
spatial=#  SELECT install_missing_indexes();
spatial=# select (addy).*, g.rating from geocode('200 East Washington Street, Indianapolis, IN 46204') as g;
 address | predirabbrev | streetname | streettypeabbrev | postdirabbrev | internal |   location   | stateabbrev |  zip  | parsed | rating
---------+--------------+------------+------------------+---------------+----------+--------------+-------------+-------+--------+--------
     200 | E            | Washington | St               |               |          | Indianapolis | IN          | 46204 | t      |      0
(1 row)

comment:13 by robe, 7 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

comment:14 by robe, 7 years ago

Resolution: wontfix
Status: newclosed
Note: See TracTickets for help on using tickets.