Ticket #1599 (assigned defect)

Opened 15 months ago

Last modified 7 weeks ago

normalize_address() confused by coutry

Reported by: mikepease Owned by: robe
Priority: medium Milestone: PostGIS 2.1.0
Component: pagc_address_parser Version: trunk
Keywords: Cc:

Description

If you include a country at the end of an address, normalize_address() incorrectly parses the state and zip.

212 n 3rd ave, Minneapolis, mn 55401, USA (This is the format of the "formatted address" returned by Google's XML webservice)

select * from normalize_address('212 n 3rd ave, Minneapolis, mn 55401, USA')

Result:

stateabbrev

212;"N";"3rd";"Ave";"";"";"mn 55401";"";"";t

Change History

Changed 15 months ago by robe

  • status changed from new to assigned
  • milestone changed from PostGIS 2.0.0 to PostGIS 2.1.0

Changed 5 months ago by woodbri

  • cc woodbri added

Changed 5 months ago by woodbri

  • cc woodbri removed

PAGC tools handle this, but the single line parsers has a trivial bug to fix related to this.

Changed 7 weeks ago by robe

  • component changed from tiger geocoder to pagc_address_parser

Changed 7 weeks ago by robe

  • version changed from 1.5.X to trunk

Steve,

This one crashes my backend, (mingw32 installed on windows 7 32 VC++), though all the others I have tried work fine:

select * from pagc_normalize_address('212 n 3rd ave, Minneapolis, mn 55401, USA') 

Gets to start load rules and then hits the bucket. Works fine it I take off the , USA.

I tried the original set to rule out my changes to the tables.

-- crashes --
SELECT (SELECT standardize_address( 'select seq, word::text, stdword::text, token from tiger.pagc_gaz union all select seq, word::text, stdword::text, token from tiger.pagc_lex '
       , 'select seq, word::text, stdword::text, token from tiger.pagc_gaz order by id'
       , 'select * from tiger.pagc_rules order by id'
, 'select 0::int4 as id, ' || quote_literal(address1) || '::text As micro, 
   ' || quote_literal(city || ', ' || state || ' ' || zip) || '::text As macro') As pagc_addr
 FROM  (SELECT * FROM parse_address('212 n 3rd ave, Minneapolis, mn 55401, USA') ) As a );
-- also crashes --
SELECT (SELECT standardize_address( 'select seq, word::text, stdword::text, token from gaz union all select seq, word::text, stdword::text, token from lex '
       , 'select seq, word::text, stdword::text, token from gaz order by id'
       , 'select * from rules order by id'
, 'select 0::int4 as id, ' || quote_literal(address1) || '::text As micro, 
   ' || quote_literal(city || ', ' || state || ' ' || zip) || '::text As macro') As pagc_addr
 FROM  (SELECT * FROM parse_address('212 n 3rd ave, Minneapolis, mn 55401, USA') ) As a );

The log:

NOTICE:  Start standardize_address
NOTICE:  start load_lex
NOTICE:  Time to read 3764 lexicon records: 10.0 ms.
NOTICE:  start load_lex
NOTICE:  Time to read 834 lexicon records: 0.0 ms.
NOTICE:  start load_rules

Changed 7 weeks ago by robe

Crash went away after r11247 , but not parsing right first parse seems wrong. Is this the issue you were talking about with country?

-- using hstore here for easier output --
SELECT (each(hstore(a))).* FROM parse_address('212 n 3rd ave, Minneapolis, mn 55401, USA') As a

yields:
   key    |                value
----------+--------------------------------------
 num      | 212
 zip      |
 city     | USA
 state    |
 street   | n 3rd ave, Minneapolis, mn 55401
 country  | US
 street2  |
 zipplus  |
 address1 | 212 n 3rd ave, Minneapolis, mn 55401

This may have to do with how I'm concatenating.

Changed 7 weeks ago by woodbri

I'll have to look into this more, but I'm out of town the rest of this week, with limited access.

Note: See TracTickets for help on using tickets.