Opened 14 years ago
Closed 7 years ago
#1110 closed defect (wontfix)
Saint Paul vs. St. Paul
Reported by: | mikepease | Owned by: | robe |
---|---|---|---|
Priority: | high | Milestone: | PostGIS Fund Me |
Component: | tiger geocoder | Version: | 1.5.X |
Keywords: | Cc: | woodbri, mattybe |
Description
Is there a way that the normalizer could look for alternate spellings of cities? For instance, several major cities in Minnesota are "Saint" something. St. Paul, St. Cloud, etc. The official spelling in the database is St. Paul, but addresses often spell out the full word "Saint Paul".
—works great[[BR]]
select * from geocode('101 7th St, St. Paul, MN')
—takes 2 minutes and fails to find the location
select * from geocode('101 7th St, Saint Paul, MN')
—works fine if the zip code is included
select * from geocode('101 7th St, Saint Paul, MN 55102')
I'm not sure if there are other words like "Saint" that have alternate legitimate spellings for a city name. Can you think of any?
Change History (14)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Cool Brian. Even with the few tiger regression tests we have so far, it's amazing how many things you can see break with slightly invasive surgery. The more tests the better.
comment:3 by , 14 years ago
I think the best way to handle this is probably to have an alternate city name or somthing like that table. Similar to what I think featnames is designed for.
comment:4 by , 14 years ago
or we can start incorporating fulltext search engine that is built into postgresql and define a custom dictionary
comment:5 by , 14 years ago
+1. Now the minimum version of PostgreSQL is high enough, you really should start looking in the fulltext with custom dictionary direction.
comment:6 by , 13 years ago
Milestone: | PostGIS 2.0.0 → PostGIS Future |
---|
comment:7 by , 12 years ago
Cc: | added |
---|
comment:8 by , 12 years ago
Another way of doing this is to create a gazetteer that maps the USPS city names to the preferred city name then standardize the city name into the preferred city name. There are about 40K unique city names associated with zipcodes but about 7K city names have multiple standardizations based on state. For example AARON(GA) → PORTAL and AARON(KY) → ALBANY. This would not be a problem is the standardization were all done in the context of the state.
comment:9 by , 11 years ago
Refer to #2694 for more examples. I know people are really bugged by this but haven't found an easy all encompassing solution for this without resorting to Steve's solution of normalize both tiger addresses and input addresses. Probably won't have cycles for such a change for a while.
comment:10 by , 10 years ago
Cc: | added |
---|
Just to pile on some more, there are some pretty egregious city "names". If you normalize an address in Indianapolis, the city you get back is "Indianapolis city (balance)". Honolulu becomes "Urban Honolulu" or "East Honolulu", but CASS only gives "Honolulu"
# select distinct stusps, place from zip_state_loc where place like '%balance%';
stusps | place
GA | Augusta-Richmond County consolidated government (balance) CT | Milford city (balance) KS | Greeley County unified government (balance) GA | Athens-Clarke County unified government (balance) IN | Indianapolis city (balance) MT | Butte-Silver Bow (balance) TN | Nashville-Davidson metropolitan government (balance) KY | Louisville/Jefferson County metro government (balance)
comment:11 by , 10 years ago
This data is just Tiger data, the Census uses legal names of places NOT postal names of places. You can use the gazetteer, remap these names into postal names or vica versa but the real solution to this involves major changes (possible rewrite) of the geocoder so that the incoming Tiger data get standardized using the same rules as the incoming geocode request. The problem is that we do not standardize the Tiger data but we do standardize the incoming request. If you want to match you need to standardize both.
I already do this in my geocoder @ imaptools.com and Regina and I plan to do something similar for Tiger Geocoder 2.0 if we every get funding or time to work on it.
comment:12 by , 10 years ago
Thanks, Steve. I also found a workaround following the advice you gave here: http://lists.osgeo.org/pipermail/postgis-users/2014-April/039096.html
spatial=# select (addy).*, g.rating from geocode('200 East Washington Street, Indianapolis, IN 46204') as g; address | predirabbrev | streetname | streettypeabbrev | postdirabbrev | internal | location | stateabbrev | zip | parsed | rating ---------+--------------+------------+------------------+---------------+----------+-----------------------------+-------------+-------+--------+-------- 200 | E | Washington | St | | | Indianapolis city (balance) | IN | 46204 | t | 15 300 | E | Washington | St | | | Indianapolis city (balance) | IN | 46202 | t | 24 spatial=# CREATE TABLE tiger_data.IN_place_add(CONSTRAINT pk_IN_place_add PRIMARY KEY (plcidfp) , CONSTRAINT chk_statefp CHECK (statefp::text = '18'::text)) INHERITS(place); spatial=# insert into tiger_data.IN_place_add select * from place where name like 'Indianapolis%'; spatial=# update tiger_data.IN_place_add set name='Indianapolis', namelsad='Indianapolis'; spatial=# SELECT install_missing_indexes(); spatial=# select (addy).*, g.rating from geocode('200 East Washington Street, Indianapolis, IN 46204') as g; address | predirabbrev | streetname | streettypeabbrev | postdirabbrev | internal | location | stateabbrev | zip | parsed | rating ---------+--------------+------------+------------------+---------------+----------+--------------+-------------+-------+--------+-------- 200 | E | Washington | St | | | Indianapolis | IN | 46204 | t | 0 (1 row)
comment:14 by , 7 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
(as pointed to by a previous email)
Anyone else who wants this to try out, also, is welcome to it…