Opened 4 years ago

Closed 4 years ago

#3699 closed defect (fixed)

Nothing downloads edge, faces, featnames, or addr files

Reported by: EvanCarroll Owned by: robe
Priority: blocker Milestone: PostGIS 2.3.2
Component: documentation Version: 2.3.x
Keywords: Cc:

Description

Running the state script, I get a message that it can not find faces, featnames, edges, or addr.

unzip: cannot find or open tl_*_48*_faces*.zip, tl_*_48*_faces*.zip.zip or tl_*_48*_faces*.zip.ZIP.

No zipfiles found.

unzip: cannot find or open tl_*_48*_featnames*.zip, tl_*_48*_featnames*.zip.zip or tl_*_48*_featnames*.zip.ZIP.

unzip: cannot find or open tl_*_48*_edges*.zip, tl_*_48*_edges*.zip.zip or tl_*_48*_edges*.zip.ZIP.

unzip: cannot find or open tl_*_48*_addr*.zip, tl_*_48*_addr*.zip.zip or tl_*_48*_addr*.zip.ZIP.

not sure what's going on here. It seems as if it's not generating the WGET call for these at all.

Change History (21)

comment:1 Changed 4 years ago by EvanCarroll

Summary: Nothing downloads EDGE filesNothing downloads edge, faces, featnames, or addr files

comment:2 Changed 4 years ago by EvanCarroll

This would be so much easier with JSONB rather than array/nest positional method currently employed.

Or, even better just generating all of this as the extension outside of the database entirely. rather than generating an extension that does it all in the database as part of the extension.

comment:3 Changed 4 years ago by robe

Which version are you using?

you sure the wget is missing or you just assuming? I got those kind of errors when Akamai started banning the downloads.

comment:4 Changed 4 years ago by robe

To your comment about better

1) JSONB is only supported on PostgreSQL 9.4+ and JSON only on PostgreSQL 9.3+, and everything in PostGIS 2.3 has to support 9.2+, so that's out of the question

2) Having a separate extension was discussed, and that would probably be the approach we take with a future geocoder, but that itself may add more dependencies and we need something that can work on all platforms.

Before there was a linux shell script that people could use to do this and I thing others have created similar projects to load the data. I didn't care for any of them because they didn't work on windows without a lot of effort and the mindset was always been "Screw windows users, they should be using Linux", which has always annoyed me about open source folks in general. Now I'll stop before I rant too much.

comment:5 Changed 4 years ago by EvanCarroll

Not, not assuming however, I *may* be wrong. I'm open to that idea. Here is the script generated and I'm an idiot for not including earlier.

https://gist.github.com/EvanCarroll/dee98b4ffec2f8297db2d90a543c9d83

You'll see this for PLACE -- wget, cd, for .. insequence.

wget http://www2.census.gov/geo/tiger/TIGER2016/PLACE/tl_2016_48_place.zip --mirror --reject=html
cd /gisdata/www2.census.gov/geo/tiger/TIGER2016/PLACE
# stuff
for z in tl_2016_48*_place.zip ; do $UNZIPTOOL -o -d $TMPDIR $z; done

But, not for everything.. Sometimes it just does this..

# No WGET
cd /gisdata/www2.census.gov/geo/tiger/TIGER2016/EDGES/
for z in tl_*_48*_edges*.zip ; do $UNZIPTOOL -o -d $TMPDIR $z; done

But, there is nothing there..

I also tried a fresh install of version 2.3.1 you can see that here,..

https://gist.github.com/EvanCarroll/d993b4f0278a2cadb5f52cf246997f12

That's before I disable TABBLOCK (as per the last bug report) and set my configuration variables again. There is simply nothing that downloads this stuff in 2.3.1 that I can see.

Last edited 4 years ago by EvanCarroll (previous) (diff)

comment:6 Changed 4 years ago by robe

Seems you are missing the county wgets. I'll check to make sure I didn't screw anything up in last committ.

Last change I made was to change wget to do explicit call for each file using www instead of ftp. The only reason for that change was that it seemed to minimize on census blocks.

This https://gist.github.com/robe2/d2c71bf438badcbe6b5ecf2c8980ecde

is what my:

SELECT loader_generate_script(ARRAY['TX'], 'sh');

looks like and it has wget for those. But that is using my development version I use before I committ changes. So it's possible I forgot to commit my last change.

comment:7 Changed 4 years ago by robe

Milestone: PostGIS 2.4.0PostGIS 2.3.2
Priority: mediumblocker

comment:8 Changed 4 years ago by robe

Resolution: invalid
Status: newclosed

Okay I was able to replicate your problem. I think what you missed doing is running the nationscript first as noted in instructions here:

http://postgis.net/docs/manual-2.3/postgis_installation.html#install_tiger_geocoder_extension

For county files the wget figures out which files to load for a state based on the county table, which is loaded by the nationscript load.

People always seem to screw that up, so I should do something to clarify it. Like put in big bold red -- DO THIS FIRST before you do anything else.

and also clarify you only need to do the nationscript load first.

Documentation patch highly welcome if you can explain this better.

Last edited 4 years ago by robe (previous) (diff)

comment:9 Changed 4 years ago by EvanCarroll

Yea, perhaps you missed a commit. Not sure on that. However, you just gave me an interesting an idea, and I feel you need to know about this: I checked the FTP! I'm not blacklisted on the FTP!!! Do you know if Census.gov blacklist FTP too? Perhaps I should check that out, maybe I can symlink ftp2.census.gov to www2.census.gov and continue to use wget? That ip address behind the FTP is also running the .gov web server with the data.

Access to 2016 Tiger Data that works: http://148.129.75.35/geo/tiger/TIGER2016/ ftp://ftp2.census.gov/geo/tiger/TIGER2016/

Despite my IP ban to: http://www2.census.gov/geo/tiger/TIGER2016/

comment:10 Changed 4 years ago by EvanCarroll

I certainly ran the nationscript. I'm 100% sure of that. I can provide to you my nationscript. Maybe it didn't work successfully? nationscript.sh

https://gist.github.com/EvanCarroll/cc6baff4f79d6f6d739bf40c9062f67d

Last edited 4 years ago by EvanCarroll (previous) (diff)

comment:11 Changed 4 years ago by robe

1) Yes the FTP blacklists too. But what it does on the fTP is more annoying. It just keeps you hanging there for days authenticating.

At least the www bows out with an error.

It would be really nice if there were tiger mirrors setup for this kind of thing

2) Yah I suspect the nationscript failed for some reason. I had the same output until I ran the nationscript.

If nationscript runs successfully,

SELECT * FROM tiger.county_all_lookup;

should have data.

comment:12 Changed 4 years ago by EvanCarroll

I take it back. I dropped everything reran the nationscript -- that same one I pasted earlier and it's working.

We should assert that SELECT * FROM tiger.county_all_lookup has rows. Anyway, thanks for the help and the info on the FTP blacklist. Being that the alt-www site runs on the same server as the ftp site I assume they're both firewalled, and will blacklist.

comment:13 Changed 4 years ago by znmeb

I don't understand why this ticket was closed as "Invalid". I've encountered both issues: lack of 'wget' for FACES, FEATNAMES, EDGES and ADDR in the "state" script (for Oregon, if it matters) and blacklisting. I've been wrestling with these for a couple of days; I ended up coding a pre-fetch script to get all the shapefiles.

Specifically, I've encountered this with the EnterpriseDB Linux 64-bit build of PostgreSQL 9.6 and PostGIS 2.3.2 *and* the Postgres official Docker image with PostGIS 2.3 installed from the "pgdg" repositories.

The blacklisting is an interesting case - it works for a while and then suddenly gets 403 forbidden. But it's only blacklisting me when I'm doing the wgets inside a Docker container or a virtual machine - they still work on my host Linux workstation. There must be something in the Linux networking stack / NAT finagling that the Census Bureau's servers don't like.

I've run out of troubleshooting time on this; I need to deploy a geocoder Docker image and I'm building a geocoder database by hand, generating a pg_dump and restoring it into the Docker image at run time. But this does look like a real issue, not operator error. The wgets aren't there in the generated script.

comment:14 Changed 4 years ago by robe

The reason it was marked as invalid was because there is nothing wrong with the script. It failed because Evan's nationscript load did not complete.

It could be the black listing that caused Evan's nationscript to fail, either way the blacklisting thing is out of PostGIS control.

Your guess is as good of mine why they black list. I've got blacklisted on both a Linux VM and a Windows physical and virtual box. On various networks. So why it doesn't eventually blacklist for you on one might be something dum like the ip range you have is on their white list.

comment:15 Changed 4 years ago by znmeb

So wait ... if I run the nation script before *generating* the state script, the state script will work? I generated the nation script, then the state script, then ran the nation script to completion, then ran the state script and FACES, FEATNAMES, EDGES and ADDR weren't downloaded.

comment:16 Changed 4 years ago by znmeb

I guess I'd make a feature request to generate a prefetch script and a load script - refactor the code into the pieces that hit the servers and the pieces that load the shapefiles into the database.

comment:17 in reply to:  15 Changed 4 years ago by znmeb

Replying to znmeb:

So wait ... if I run the nation script before *generating* the state script, the state script will work? I generated the nation script, then the state script, then ran the nation script to completion, then ran the state script and FACES, FEATNAMES, EDGES and ADDR weren't downloaded.

Yes ... in fact, that's exactly what happens! I ran the state script generator *after* the database was populated and now it has the wgets for the missing shapefiles. So the documentation should read:

  1. Generate the nation script.
  2. Run the nation script.
  3. Generate the state script.
  4. Run the state script.

comment:18 Changed 4 years ago by robe

Are nevermind I see what you are saying. I do have the instructions wrong. Okay I'll fix.

Last edited 4 years ago by robe (previous) (diff)

comment:19 Changed 4 years ago by robe

Component: tiger geocoderdocumentation
Resolution: invalid
Status: closedreopened

comment:20 Changed 4 years ago by robe

In 15314:

Clarify that nation script must be run first before any states loaded
references #3699

comment:21 Changed 4 years ago by robe

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.