Benchmarking speed between built-in tiger normalizer and pagc_address_parser
I've started to benchmark speed/quality differences between built-in normalizer and pagc one. On a first glance it appears the built-in normalizer is faster. This may have to do with how I'm calling it, the fact that pagc I have currently compiled with debug flags — so spitting out a lot of notices, the fact that the built-in normalizer is taking advantage of indexes and doesn't need to load the lookup tables (thus less sensitive to shared memory), or a memory leak somewhere or a combination of one or more of the above and other things.
Interestingly since the pagc normalizes better, the speed slow-down in geocoding has gone up a bit so it ends up being win anyway.
So I was able to run it thru addresses I couldn't geocode before and was able to.
This suggests 2 approaches of using pagc
1) As a pure drop in replacement for existing normalizer 2) As a complementary — used to prenormalize difficult addresses.