Opened 12 years ago

Closed 9 years ago

#1384 closed defect (fixed)

Issue with geocoding addresses where place name doesn't match

Reported by: robe Owned by: robe
Priority: medium Milestone: PostGIS 2.2.0
Component: tiger geocoder Version: master
Keywords: Cc: woodbri

Description

Example boroughs of NY. Tiger just has these as New York , NY.

This geocodes fast:

select pprint_addy(addy), ST_AsText(geomout), rating FROM geocode('2601 24TH AVE,  NY 111022337',2);

This geocodes slow and gives wrong answer

select pprint_addy(addy), ST_AsText(geomout), rating FROM geocode('2601 24TH AVE, ASTORIA, NY 111022337',2);

More examples on #1382

Attachments (1)

astoriaF.png (542.2 KB ) - added by darkblueb 12 years ago.

Download all attachments as: .zip

Change History (6)

comment:1 by darkblueb, 12 years ago

attached is a visual result of an experiment with Astoria, NY data. The red dots are authoritative addresses and locations, the blue dots have two labels: blue label is the address as supplied to geocode(); green label is the pprint_addy result from geocode; the blue dots are the location returned by geocode();

by darkblueb, 12 years ago

Attachment: astoriaF.png added

comment:2 by robe, 12 years ago

Milestone: PostGIS 2.0.0PostGIS 2.1.0

comment:3 by woodbri, 12 years ago

Cc: woodbri added

comment:4 by robe, 11 years ago

Milestone: PostGIS 2.1.0PostGIS Future

comment:5 by robe, 9 years ago

Milestone: PostGIS FuturePostGIS 2.2.0
Resolution: fixed
Status: newclosed

Okay this actually geocodes fine even though the place name doesn't match and the zip code is longer than 5 (presumably it should have a -). I checked google and it thinks that astoria should be 26-01. That's a separate issue that we don't deal with hyphenated street numbers.

So with tiger 2015 data (i have MA,MN,NY,PA,KS,RI loaded on my windows 7 64-bit 9.4 desktop) I get:

test_tiger=# select pprint_addy(addy), ST_AsText(geomout), rating FROM geocode('2601 24TH AVE,  NY 111022337',2);
          pprint_addy           |                 st_astext                 | rating
--------------------------------+-------------------------------------------+--------
 0 24th Ave, New York, NY 11102 | POINT(-73.9211006425579 40.7761925056417) |     18
(1 row)


Time: 22.624 ms

test_tiger=# select pprint_addy(addy), ST_AsText(geomout), rating FROM geocode('2601 24TH AVE, ASTORIA, NY 111022337',2);
         pprint_addy          |                 st_astext                 | rating
------------------------------+-------------------------------------------+--------
 24th Ave, New York, NY 11214 | POINT(-73.9883851176818 40.6000773220411) |     18
 24th Ave, New York, NY 11204 | POINT(-73.9743513111828 40.6135516510796) |     21
(2 rows)


Time: 8311.949 ms

test_tiger=# select pprint_addy(addy), ST_AsText(geomout), rating 
FROM geocode('26-01 24TH AVE, ASTORIA, NY 11102',2);
          pprint_addy           |                 st_astext                 | rating
--------------------------------+-------------------------------------------+--------
 0 24th Ave, New York, NY 11102 | POINT(-73.9211006425579 40.7761925056417) |     17
(1 row)


Time: 17.237 ms

So second takes much longer to process but still gives more or less right answer (given we can't handle hyphenated street numbers).

For compare

Google can't handle the first version of the address and gives nothing

the second it corrects and says

-73.918206, 40.7744398

and that the address is: 26-01 24th Ave Queens, NY 11102

So tiger geocoder is in right ball park if only we could do the right thing with the numbers. The last answer (where we feed in the correct representation of the address is pretty close to what google returns).

Note: See TracTickets for help on using tickets.