Opened 11 years ago
Closed 9 years ago
#2734 closed defect (fixed)
TIGER geocoder geocode_intersection gives no results with omitted zipcode
Reported by: | futuraprime | Owned by: | robe |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 2.2.0 |
Component: | tiger geocoder | Version: | 2.1.x |
Keywords: | Cc: |
Description
I'm using the TIGER geocoder (2013 data, PostGIS 2.1, Postgres 9.3) to geocode intersections in New York City, but I'm finding that if I omit the zip code, I almost never get a match. So, for example…
ny_tiger=# SELECT pprint_addy(addy), st_astext(geomout), rating FROM geocode_intersection('Broadway', 'Houston St', 'NY', 'New York', '10012'); pprint_addy | st_astext | rating ----------------------------------+-----------------------------+-------- 607 Broadway, New York, NY 10012 | POINT(-73.996772 40.725432) | 0 604 Broadway, New York, NY 10012 | POINT(-73.996772 40.725432) | 0 609 Broadway, New York, NY 10012 | POINT(-73.996772 40.725432) | 0 606 Broadway, New York, NY 10012 | POINT(-73.996772 40.725432) | 0 (4 rows)
but…
ny_tiger=# SELECT pprint_addy(addy), st_astext(geomout), rating FROM geocode_intersection('Broadway', 'Houston St', 'NY', 'New York'); pprint_addy | st_astext | rating -------------+-----------+-------- (0 rows)
Regina on GIS StackExchange (http://gis.stackexchange.com/questions/95511/tiger-geocoder-not-matching-intersections-without-zip) suggests that this is a bug, and requested I report it here.
The debug trace she requested is:
ny_tiger=# SELECT pprint_addy(addy), st_astext(geomout), rating FROM geocode_intersection('Broadway', 'Houston St', 'NY', 'New York'); NOTICE: var_zip: '{10001,10002,10003,10004,10005,10006,10007,10009,10010,10011,10012,10013,10014,10016,10017,10018,10019,10020,10021,10022,10023,10024,10025,10026,10027,10028,10029,10030,10031,10032,10033,10034,10035,10036,10037,10038,10039,10040,10044,10065,10069,10075,10107,10110,10112,10118,10119,10120,10123,10128,10166,10169,10175,10270,10278,10279,10280,10281,10282,10285,10301,10301,10302,10303,10304,10305,10306,10307,10308,10309,10310,10312,10314,10451,10452,10453,10454,10455,10456,10457,10458,10459,10460,10461,10462,10463,10463,10464,10464,10465,10466,10466,10467,10468,10469,10470,10470,10471,10472,10473,10474,10475,10550,10550,10704,10704,10705,10705,11001,11001,11003,11004,11040,11101,11102,11103,11104,11105,11106,11109,11201,11203,11204,11205,11206,11207,11207,11208,11208,11209,11210,11211,11212,11213,11214,11215,11216,11217,11218,11219,11219,11220,11221,11222,11223,11223,11224,11225,11226,11228,11229,11230,11231,11232,11233,11234,11235,11236,11237,11237,11238,11239,11249,11252,11354,11355,11356,11357,11358,11359,11360,11361,11362,11363,11364,11365,11366,11367,11368,11369,11370,11372,11373,11374,11375,11377,11378,11379,11385,11385,11411,11412,11413,11414,11414,11415,11416,11416,11417,11417,11418,11419,11420,11421,11421,11422,11422,11423,11426,11426,11427,11428,11428,11429,11429,11430,11432,11433,11434,11435,11436,11580,11580,11691,11691,11692,11693,11694,11695,11697}', city: 'New York' NOTICE: sql: WITH a1 AS (SELECT f.*, addr.fromhn, addr.tohn, addr.side , addr.zip FROM (SELECT * FROM featnames WHERE statefp = '36' AND ( lower(name) = 'broadway' or lower(fullname) LIKE 'broadway ' || '%' )) AS f LEFT JOIN (SELECT * FROM addr WHERE addr.statefp = '36') As addr ON (addr.tlid = f.tlid AND addr.statefp = f.statefp) WHERE '{10001,10002,10003,10004,10005,10006,10007,10009,10010,10011,10012,10013,10014,10016,10017,10018,10019,10020,10021,10022,10023,10024,10025,10026,10027,10028,10029,10030,10031,10032,10033,10034,10035,10036,10037,10038,10039,10040,10044,10065,10069,10075,10107,10110,10112,10118,10119,10120,10123,10128,10166,10169,10175,10270,10278,10279,10280,10281,10282,10285,10301,10301,10302,10303,10304,10305,10306,10307,10308,10309,10310,10312,10314,10451,10452,10453,10454,10455,10456,10457,10458,10459,10460,10461,10462,10463,10463,10464,10464,10465,10466,10466,10467,10468,10469,10470,10470,10471,10472,10473,10474,10475,10550,10550,10704,10704,10705,10705,11001,11001,11003,11004,11040,11101,11102,11103,11104,11105,11106,11109,11201,11203,11204,11205,11206,11207,11207,11208,11208,11209,11210,11211,11212,11213,11214,11215,11216,11217,11218,11219,11219,11220,11221,11222,11223,11223,11224,11225,11226,11228,11229,11230,11231,11232,11233,11234,11235,11236,11237,11237,11238,11239,11249,11252,11354,11355,11356,11357,11358,11359,11360,11361,11362,11363,11364,11365,11366,11367,11368,11369,11370,11372,11373,11374,11375,11377,11378,11379,11385,11385,11411,11412,11413,11414,11414,11415,11416,11416,11417,11417,11418,11419,11420,11421,11421,11422,11422,11423,11426,11426,11427,11428,11428,11429,11429,11430,11432,11433,11434,11435,11436,11580,11580,11691,11691,11692,11693,11694,11695,11697}'::text[] IS NULL OR addr.zip = ANY('{10001,10002,10003,10004,10005,10006,10007,10009,10010,10011,10012,10013,10014,10016,10017,10018,10019,10020,10021,10022,10023,10024,10025,10026,10027,10028,10029,10030,10031,10032,10033,10034,10035,10036,10037,10038,10039,10040,10044,10065,10069,10075,10107,10110,10112,10118,10119,10120,10123,10128,10166,10169,10175,10270,10278,10279,10280,10281,10282,10285,10301,10301,10302,10303,10304,10305,10306,10307,10308,10309,10310,10312,10314,10451,10452,10453,10454,10455,10456,10457,10458,10459,10460,10461,10462,10463,10463,10464,10464,10465,10466,10466,10467,10468,10469,10470,10470,10471,10472,10473,10474,10475,10550,10550,10704,10704,10705,10705,11001,11001,11003,11004,11040,11101,11102,11103,11104,11105,11106,11109,11201,11203,11204,11205,11206,11207,11207,11208,11208,11209,11210,11211,11212,11213,11214,11215,11216,11217,11218,11219,11219,11220,11221,11222,11223,11223,11224,11225,11226,11228,11229,11230,11231,11232,11233,11234,11235,11236,11237,11237,11238,11239,11249,11252,11354,11355,11356,11357,11358,11359,11360,11361,11362,11363,11364,11365,11366,11367,11368,11369,11370,11372,11373,11374,11375,11377,11378,11379,11385,11385,11411,11412,11413,11414,11414,11415,11416,11416,11417,11417,11418,11419,11420,11421,11421,11422,11422,11423,11426,11426,11427,11428,11428,11429,11429,11430,11432,11433,11434,11435,11436,11580,11580,11691,11691,11692,11693,11694,11695,11697}'::text[]) OR addr.zip IS NULL ORDER BY CASE WHEN lower(f.fullname) = 'broadway ' THEN 0 ELSE 1 END LIMIT 5000 ), a2 AS (SELECT f.*, addr.fromhn, addr.tohn, addr.side , addr.zip FROM (SELECT * FROM featnames WHERE statefp = '36' AND ( lower(name) = 'houston' or lower(fullname) LIKE 'houston st' || '%' ) ) AS f LEFT JOIN (SELECT * FROM addr WHERE addr.statefp = '36') AS addr ON (addr.tlid = f.tlid AND addr.statefp = f.statefp) WHERE '{10001,10002,10003,10004,10005,10006,10007,10009,10010,10011,10012,10013,10014,10016,10017,10018,10019,10020,10021,10022,10023,10024,10025,10026,10027,10028,10029,10030,10031,10032,10033,10034,10035,10036,10037,10038,10039,10040,10044,10065,10069,10075,10107,10110,10112,10118,10119,10120,10123,10128,10166,10169,10175,10270,10278,10279,10280,10281,10282,10285,10301,10301,10302,10303,10304,10305,10306,10307,10308,10309,10310,10312,10314,10451,10452,10453,10454,10455,10456,10457,10458,10459,10460,10461,10462,10463,10463,10464,10464,10465,10466,10466,10467,10468,10469,10470,10470,10471,10472,10473,10474,10475,10550,10550,10704,10704,10705,10705,11001,11001,11003,11004,11040,11101,11102,11103,11104,11105,11106,11109,11201,11203,11204,11205,11206,11207,11207,11208,11208,11209,11210,11211,11212,11213,11214,11215,11216,11217,11218,11219,11219,11220,11221,11222,11223,11223,11224,11225,11226,11228,11229,11230,11231,11232,11233,11234,11235,11236,11237,11237,11238,11239,11249,11252,11354,11355,11356,11357,11358,11359,11360,11361,11362,11363,11364,11365,11366,11367,11368,11369,11370,11372,11373,11374,11375,11377,11378,11379,11385,11385,11411,11412,11413,11414,11414,11415,11416,11416,11417,11417,11418,11419,11420,11421,11421,11422,11422,11423,11426,11426,11427,11428,11428,11429,11429,11430,11432,11433,11434,11435,11436,11580,11580,11691,11691,11692,11693,11694,11695,11697}'::text[] IS NULL OR addr.zip = ANY('{10001,10002,10003,10004,10005,10006,10007,10009,10010,10011,10012,10013,10014,10016,10017,10018,10019,10020,10021,10022,10023,10024,10025,10026,10027,10028,10029,10030,10031,10032,10033,10034,10035,10036,10037,10038,10039,10040,10044,10065,10069,10075,10107,10110,10112,10118,10119,10120,10123,10128,10166,10169,10175,10270,10278,10279,10280,10281,10282,10285,10301,10301,10302,10303,10304,10305,10306,10307,10308,10309,10310,10312,10314,10451,10452,10453,10454,10455,10456,10457,10458,10459,10460,10461,10462,10463,10463,10464,10464,10465,10466,10466,10467,10468,10469,10470,10470,10471,10472,10473,10474,10475,10550,10550,10704,10704,10705,10705,11001,11001,11003,11004,11040,11101,11102,11103,11104,11105,11106,11109,11201,11203,11204,11205,11206,11207,11207,11208,11208,11209,11210,11211,11212,11213,11214,11215,11216,11217,11218,11219,11219,11220,11221,11222,11223,11223,11224,11225,11226,11228,11229,11230,11231,11232,11233,11234,11235,11236,11237,11237,11238,11239,11249,11252,11354,11355,11356,11357,11358,11359,11360,11361,11362,11363,11364,11365,11366,11367,11368,11369,11370,11372,11373,11374,11375,11377,11378,11379,11385,11385,11411,11412,11413,11414,11414,11415,11416,11416,11417,11417,11418,11419,11420,11421,11421,11422,11422,11423,11426,11426,11427,11428,11428,11429,11429,11430,11432,11433,11434,11435,11436,11580,11580,11691,11691,11692,11693,11694,11695,11697}'::text[]) or addr.zip IS NULL ORDER BY CASE WHEN lower(f.fullname) = 'houston st' THEN 0 ELSE 1 END LIMIT 5000 ), e1 AS (SELECT e.the_geom, e.tnidf, e.tnidt, a.*, CASE WHEN a.side = 'L' THEN e.tfidl ELSE e.tfidr END AS tfid FROM a1 As a INNER JOIN edges AS e ON (e.statefp = a.statefp AND a.tlid = e.tlid) WHERE e.statefp = '36' ORDER BY CASE WHEN lower(a.name) = 'houston' THEN 0 ELSE 1 END + CASE WHEN lower(e.fullname) = 'houston st' THEN 0 ELSE 1 END LIMIT 1000) , e2 AS (SELECT e.the_geom, e.tnidf, e.tnidt, a.*, CASE WHEN a.side = 'L' THEN e.tfidl ELSE e.tfidr END AS tfid FROM (SELECT * FROM edges WHERE statefp = '36') AS e INNER JOIN a2 AS a ON (e.statefp = a.statefp AND a.tlid = e.tlid) INNER JOIN e1 ON (e.statefp = e1.statefp AND ST_Intersects(e.the_geom, e1.the_geom) AND ARRAY[e.tnidf, e.tnidt] && ARRAY[e1.tnidf, e1.tnidt] ) WHERE (lower(e.fullname) = 'houston st' or lower(a.name) LIKE 'houston' || '%') ORDER BY CASE WHEN lower(a.name) = 'houston' THEN 0 ELSE 1 END + CASE WHEN lower(e.fullname) = 'houston st' THEN 0 ELSE 1 END LIMIT 100 ), segs AS (SELECT DISTINCT ON(e1.tlid, e1.side) CASE WHEN e1.tnidf = e2.tnidf OR e1.tnidf = e2.tnidt THEN e1.fromhn ELSE e1.tohn END As address, e1.predirabrv As fedirp, COALESCE(e1.prequalabr || ' ','' ) || e1.name As fename, COALESCE(e1.suftypabrv,e1.pretypabrv) As fetype, e1.sufdirabrv AS fedirs, p.name As place, e1.zip, CASE WHEN e1.tnidf = e2.tnidf OR e1.tnidf = e2.tnidt THEN ST_StartPoint(ST_GeometryN(ST_Multi(e1.the_geom),1)) ELSE ST_EndPoint(ST_GeometryN(ST_Multi(e1.the_geom),1)) END AS geom , CASE WHEN lower(p.name) = 'new york' THEN 0 ELSE 1 END + levenshtein_ignore_case(p.name, 'new york') + levenshtein_ignore_case(e1.name || COALESCE(' ' || e1.sufqualabr, ''),'broadway') + CASE WHEN e1.fullname = 'broadway ' THEN 0 ELSE levenshtein_ignore_case(e1.fullname, 'broadway ') END + + levenshtein_ignore_case(e2.name || COALESCE(' ' || e2.sufqualabr, ''),'houston') AS a_rating FROM e1 INNER JOIN e2 ON ( ST_Intersects(e1.the_geom, e2.the_geom) ) INNER JOIN (SELECT * FROM faces WHERE statefp = '36') As fa1 ON (e1.tfid = fa1.tfid ) LEFT JOIN place AS p ON (fa1.placefp = p.placefp AND p.statefp = '36' ) ORDER BY e1.tlid, e1.side, a_rating LIMIT 10*4 ) SELECT address, fedirp , fename, fetype,fedirs,place, zip , geom, a_rating FROM segs ORDER BY a_rating LIMIT 10 pprint_addy | st_astext | rating -------------+-----------+-------- (0 rows)
Thanks,
Evan
Change History (11)
comment:1 by , 11 years ago
Milestone: | → PostGIS 2.2.0 |
---|
comment:2 by , 11 years ago
comment:3 by , 11 years ago
hmm actually I see another problem with the first that it's got a space in it for broadway. I think I have to rethink those two first sets of queries, nevermind what I said for now.
comment:4 by , 11 years ago
I think the space is fine—it's trying to check both "Broadway" and "Broadway Ave", but Broadway doesn't have a type to abbreviate.
Increasing the limits as you suggested does return a result. I'm guessing this is an issue to do with the sheer number of intersections available to the geocoder in New York City?
comment:5 by , 11 years ago
Yes particularly if its a common street name like Broadway. I think there is still room for improvement here so I'll still tackle it in 2.1.4 or 2.2
comment:6 by , 10 years ago
Priority: | medium → high |
---|
comment:7 by , 9 years ago
Summary: | TIGER geocoder gives no results with omitted zipcode → TIGER geocoder geocode_intersection gives no results with omitted zipcode |
---|
comment:8 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Okay should be much improved at r13705 . I increased limit of records returned at substeps and even with that this particular address returns in about the same time with/without zip (a little of 1 second with a couple of states loaded).
I'm not going to bother backporting though since it may have some performance consequences for other common streets.
comment:9 by , 9 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
comment:10 by , 9 years ago
oops reran this and now getting
invalid intput syntax for integer: "612-100". Oy. This might be just a difference in data for 2015 or my recent change.
comment:11 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
this is a different problem - will log as separate ticket. Has to do with some address numbers not being numbers so need to be cast to number so can fit in norm_addy. I thnk my change just resulted in a couple of more matches (which are correct) but are not numbers.
Evan,
So what I was saying — might as well say it here too for completeness.
From the debugged output, it looks like the 10012 zip is in the check so I'm feeling more and more that its the limits being set.
Try running the debugged sql output: Except change the LIMIT 5000 to something like LIMIT 10000 and LIMIT 1000 to like LIMIT 5000 and LIMIT 100 to something like LIMIT 1000
So basically try running this query: