Opened 13 years ago

Closed 9 years ago

Last modified 9 years ago

#1074 closed defect (worksforme)

Matching on different street spelling?

Reported by: mikepease Owned by: robe
Priority: medium Milestone: PostGIS 2.2.0
Component: tiger geocoder Version: master
Keywords: Cc: woodbri

Description

Do you think there is anything you can do to find matches on slightly different spellings of street names?

One case of this is when a compound name is spelled with a space in the middle. For example: COTTAGE WOOD instead of COTTAGEWOOD

—this works great
select (addy).*,* from geocode('8525 COTTAGEWOOD TERR, Blaine, MN 55434')

—But an alternate spelling of COTTAGE WOOD throws it WAY off
select (addy).*,* from geocode('8525 COTTAGE WOOD TERR, Blaine, MN 55434')

Change History (14)

comment:1 by robe, 13 years ago

Hmm well it should be doing some matching with misspellings since it uses soundex. However trigrams seem to be a bit better for matches in some of these kinds of cases. If its a common issue with spaces perhaps we can get rid of spaces as a secondary soundex (as a hot fix).

We were hoping to integrate trigram functionality in there to augment some of that fuzzy matching and also to improve on speed since GIST on trigram seems to be a bit faster than gist on soundex (and much improved with KNN GIST in 9.1). I'm not sure when we'll have time to get to that though.

comment:2 by mikepease, 13 years ago

Another slightly different example. RHODE ISLAND AVE vs. RHODEISLAND AVE

—correct spelling doesn't work
select * from geocode('3420 RHODE ISLAND AVE S, ST. LOUIS PARK, MN 55426')

—incorrect spelling does work and normalizes to the correct spelling
select * from geocode('3420 RHODEISLAND AVE S, ST. LOUIS PARK, MN 55426')

comment:3 by robe, 13 years ago

I stand corrected, this is another variant of Lake like case

comment:4 by robe, 13 years ago

The Rhode Island case should be taken care of at: r7616. That was another regress failure that snuck in when dealing with highways. The cottage wood case haven't tried yet, but if soundex is not working for that like I said — next stop is fulltext search with custom dictionaries.

comment:5 by mikepease, 13 years ago

Hopefully this is more helpful than annoying. Here's 3 more example cases of this issue:

1701 Main Street, Hopkins, MN 55343 → (works with Mainstreet)

1421 ENERGY PARK DR, ST. PAUL, MN 55108 → (works with ENERGYPARK DR)

4343 MEADOW BROOK BLVD, ST. LOUIS PARK, MN 55416 → (works with MEADOWBROOK BLVD)

—works

select abs(ST_X(geomout)::numeric(8,5))
'W, ' ST_Y(geomout)::numeric(8,5) 'N' as lat_lon, *

from geocode('1701 MainStreet, Hopkins, MN 55343')

—doesn't work

select abs(ST_X(geomout)::numeric(8,5))
'W, ' ST_Y(geomout)::numeric(8,5) 'N' as lat_lon, *

from geocode('1701 Main Street, Hopkins, MN 55343')

comment:6 by mikepease, 13 years ago

Regina, would it be helpful to you if I sent you a big list of addresses that appear to have failed the geocoder?

Out of my list of 70,000+, I have a list of 4,700 that I scored as "failed" based on the rating, and how many of the address components came up with a match. If so, is there a way I could send you that list privately instead of posting it on this website?

comment:7 by darkblueb, 13 years ago

OSGeo admins have created a folder on download.osgeo DOT org called download.osgeo DOT org/postgis/ (I just put the sample USPS CASS rows there.. - I suggest bzip2 because of the handy bzcat to psql)

Please use ticket http://trac.osgeo.org/osgeo/ticket/737 to request upload access

comment:8 by robe, 13 years ago

Mike,

I think the cottage wood example was an issue with the streettype being mixed up. It seems to work fine with my latests tests. Can you test that one again to verify that one is fixed?

comment:9 by mikepease, 13 years ago

The latest version is working better, but it can still get stumped. Here's my results:

Works well now:
4373 LAKE DR, ROBBINSDALE, MN 55422 8525 COTTAGE WOOD TERR, Blaine, MN 55434 3420 RHODE ISLAND AVE S, ST. LOUIS PARK, MN 55426 1421 ENERGY PARK DR, ST. PAUL, MN 55108 4343 MEADOW BROOK BLVD, ST. LOUIS PARK, MN 55416

Still not getting the correct location:
1701 Main Street, Hopkins, MN 55343

—Mainstreet works, that's the actual street name but "Street" is getting pulled into street type and then no match is found for just "Main" in Hopkins. I think this is a different case.

This case is when a street name has a keyword for a street type in it. When that keyword is pulled out of the street name, no match is found.

Your full text search idea might fix this. Or could a partial match work, like

street_name ilike 'Main%'

comment:10 by robe, 13 years ago

For that to work I'd have to add a var_ops btree index. Right now its doing a soundex or exact match. I'm going to experiment to see which is more efficient for both speed and efficiency a var_ops btree or fulltext (or trigram — but trigram helping ILIKE only works in 9.1, so that rules that out a bit thought I could use native trigram similarity checks) . Fulltext would probably solve the St. .. and Cam / Camino cases with custom dictionaries so I'm still leaning toward that solution.

comment:11 by robe, 13 years ago

Milestone: PostGIS 2.0.0PostGIS Future

comment:12 by woodbri, 12 years ago

Cc: woodbri added

comment:13 by robe, 9 years ago

Resolution: worksforme
Status: newclosed

Okay this seems to work as far as I can tell with tiger 2015 data.

I do this:

SELECT (addy).*, * FROM geocode('1701 Main Street, Hopkins, MN 55343');

and get this with Tiger 2015

 address | predirabbrev | streetname | streettypeabbrev | postdirabbrev | internal | location | stateabbrev |  zip  | parsed |                 addy                 |                      geomout                       | rating
---------+--------------+------------+------------------+---------------+----------+----------+-------------+-------+--------+--------------------------------------+----------------------------------------------------+--------
    1701 |              | Main       | St               |               |          | Hopkins  | MN          | 55343 | t      | (1701,,Main,St,,,Hopkins,MN,55343,t) | 0101000020AD1000002B00C6A7FB5A57C0C47324D253764640 |      0

comment:14 by robe, 9 years ago

Milestone: PostGIS FuturePostGIS 2.2.0
Version: 1.5.Xtrunk
Note: See TracTickets for help on using tickets.