Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#5923 closed defect (fixed)

vsicurl should append additional filenames before querystring

Reported by: pksorensen Owned by: warmerdam
Priority: normal Milestone:
Component: default Version: unspecified
Severity: normal Keywords:
Cc:

Description

Using vsicurl urls one might run into the following issue:

  • Connection #0 to host ascendxyzdatastagingweu.blob.core.windows.net left intact
  • Couldn't find host ascendxyzdatastagingweu.blob.core.windows.net in the _netrc file; using defaults
  • Found bundle for host ascendxyzdatastagingweu.blob.core.windows.net: 0x428ffebea0
  • Re-using existing connection! (#0) with host ascendxyzdatastagingweu.blob.core.windows.net
  • Connected to ascendxyzdatastagingweu.blob.core.windows.net (168.61.57.78) port 80 (#0)

    HEAD /0f0f36bcf00244ce8f7096c29076d4cb-tiffs/MacLachlan%20Property.tif?sv=2013-08-15&ss=2015-04-14T13%3A23%3A55Z&se=2015-04-21T13%3A23%3A51Z&sp=r&sr=b&

sig=1C37C5w%3D.aux.xml HTTP/1.1 Host: ascendxyzdatastagingweu.blob.core.windows.net Accept: */*

Where it append .aux.xml to the hole url instead of inserting it before the query.

Change History (6)

comment:1 by Jukka Rahkonen, 9 years ago

How can I reproduce this?

comment:2 by pksorensen, 9 years ago

by running a command like the following:

gdalinfo --config CPL_CURL_VERBOSE YES "/vsicurl/http://ascendworkerweu.blob.core.windows.net/shares/torkild/DSC_0034.JPG?testquery=dummy"

Do notice that this one will spin into infinity loop due to it getting the same file on sub calls.

comment:3 by pksorensen, 9 years ago

Is it possible to delete above comment or change? (no reason to download 9mb data for each time)

gdalinfo --config CPL_CURL_VERBOSE YES "/vsicurl/http://ascendworkerweu.blob.core.windows.net/shares/blank.png?testquery=dummy"

comment:4 by Jukka Rahkonen, 9 years ago

All right, I understand. If someone else happens to read this later: gdalinfo, after reading the "blank.png" file, tries to get also some georeferencing info by trying if it could find from the same base-URL some of the following files:

blank.png	
blank.png.aux.xml
blank.aux	
blank.AUX	
blank.png.aux	
blank.png.AUX	
blank.png.ovr	
blank.png.msk	
blank.pgw	
blank.PGW	
blank.pngw	
blank.PNGW	
blank.wld	
blank.WLD	

In this case the base-URL is not a normal directory but rather some sort of service. GDAL is using the base-URL and adds ".aux.xml" at the end.

http://ascendworkerweu.blob.core.windows.net/shares/blank.png?testquery=dummy.aux.xml

As a result the service sends back the same png as before and that for each tested file extension. When .ovr is to be tested GDAL goes wild and starts to test .ovr, .ovr.ovr, .ovr.ovr.ovr and so on.

What you want to happen is that the next query would be

http://ascendworkerweu.blob.core.windows.net/shares/blank.aux.xml
?testquery=dummy

With plain file directory this does not happen and the following query does not return anything:

gdalinfo --config CPL_CURL_VERBOSE YES "/vsicurl/http://latuviitta.org/documents/Buffer_along_route.png?query=foo"

I fear that a universal solution that would work correctly with all self-made services may be difficult to reach but infinite loop is not good at all.

comment:5 by Even Rouault, 9 years ago

Resolution: fixed
Status: newclosed

trunk r28907 "Avoid fetching remote non-existing resources for sidecar files, when using /vsicurl/ with a URL that takes arguments (#5923)"

Said otherwise, if the main file is "/vsicurl/http://example.com/foo.tif?arg=value", then do not try to fetch auxiliary files. We could potentially try fetching "/vsicurl/http://example.com/foo.alternate_extension?arg=value", but in most cases that wouldn't work, so above commit should be OK, and not break anything hopefully.

comment:6 by pksorensen, 9 years ago

I think the following:


We could potentially try fetching "/vsicurl/ http://example.com/foo.alternate_extension?arg=value", but in most cases that wouldn't work, so above commit should be OK, and not break anything hopefully.


should be the default behavior.

The reason is that if files are located on a cloud storage system like s3 or azure blob storage, it is possible to do authorization with signed signature tokens. These tokens are put on the url. The tokens can be for a bucket/container, meaning that the same query paramters for /foo.tif?arg=value also works for /foo.aux.xml?arg=value

Its a common pattern for auth tokens on querystring in a cloud storage system.

Note: See TracTickets for help on using tickets.