Opened 3 years ago

Closed 21 months ago

#5526 closed enhancement (fixed)

[PATCH] CPLEscapeString (CPLES_URL) encoding some characters unnecessarily

Reported by: tastle Owned by: warmerdam
Priority: normal Milestone: 2.1.0
Component: default Version: svn-trunk
Severity: normal Keywords:
Cc:

Description

CPLEscapeString is encoding characters for URLs which do not need to be encoded.

This can be problematic for web servers that (for reasons unknown) don't or can't perform URL decoding.

From RFC1738: <quote>

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and

reserved characters used for their reserved purposes may be used unencoded within a URL.

<quote>

I propose that the above list of characters are not encoded in CPLEscapeString for CPLES_URL.

Attachments (1)

cpl_string.cpp.patch (1.5 KB) - added by tastle 3 years ago.
Prevent encoding characters that don't need to be encoded.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 3 years ago by tastle

Here is a web service that I've observed this problem. The problem clearly is on the web server as far as I'm concerned, however, RFC1738 states that the hyphen does not need to be encoded.

The first URL is hand-written based on what I've observed how the WMS driver encodes the BATHY-DALLE_PYR-PNG_WGS84G_WMS layer name in the WMS. I've hand-written the URL because there are additional server configuration issues I need to bypass for the purpose of this ticket. (Rest assured the system administrators have been notified.)

http://services.data.shom.fr/INSPIRE/wms/r?SERVICE=WMS&request=GetMap&version=1.3.0&layers=BATHY-DALLE_PYR%2DPNG_WGS84G_WMS&styles=&crs=EPSG:4326&transparent=TRUE&format=image/png&width=744&height=1024&bbox=36.00000000,-12.00000000,52.00000000,10.00000000

Now if GDAL were to have the provided patch applied, we should get the following layer name, and the server wouldn't balk on the request.

http://services.data.shom.fr/INSPIRE/wms/r?SERVICE=WMS&request=GetMap&version=1.3.0&layers=BATHY-DALLE_PYR-PNG_WGS84G_WMS&styles=&crs=EPSG:4326&transparent=TRUE&format=image/png&width=744&height=1024&bbox=36.00000000,-12.00000000,52.00000000,10.00000000)

comment:3 Changed 3 years ago by Even Rouault

The patch is buggy. It has a mix of
and && operators. I doubt it worked with your example.
cpl_string.cpp: In function ‘char* CPLEscapeString(const char*, int, int)’:
cpl_string.cpp:1572: warning: suggest parentheses around ‘&&’ within ‘||’
cpl_string.cpp:1574: warning: suggest parentheses around ‘&&’ within ‘||’
cpl_string.cpp:1575: warning: suggest parentheses around ‘&&’ within ‘||’
cpl_string.cpp:1576: warning: suggest parentheses around ‘&&’ within ‘||’
cpl_string.cpp:1577: warning: suggest parentheses around ‘&&’ within ‘||’

And in your example, why would there be a dash in BATHY-DALLE_PUR%2DPNG_WGS84G_WMS before the patch ? I suppose you meant BATHY%2DDALLE_PUR%2DPNG_WGS84G_WMS.

comment:4 Changed 3 years ago by tastle

Sorry about the mistake in the patch. I'll fix that up.

As for the dash in the description, I had a few URLs floating around for a few issues so I just inline adjusted the URL during writing. I just have missed the first hyphen character.

Changed 3 years ago by tastle

Attachment: cpl_string.cpp.patch added

Prevent encoding characters that don't need to be encoded.

comment:5 Changed 3 years ago by tastle

Fixed. I did something stupid. I added another line after testing it to make it closer to the RFC1738, but copied the AND logic instead of the OR. Rebuilt with this new patch and I don't see the above errors.

comment:6 Changed 3 years ago by tastle

comment:7 Changed 3 years ago by Even Rouault

I'm not sure about the '+'. In some contexts, + is use to encode the space character. So it would probably be safer to still encode the +

comment:8 Changed 3 years ago by tastle

Only if you wanted to keep the plus sign as-is, would it need to be encoded. Else the expectation would be that a server would treat them as a space. So encoding a plus sign would negate it from being a space, which would be a bug.

Typically, you only see + signs as part of the query. It's essentially a short-form of %20.

http://lmgtfy.com/?q=space+please

You'll only see %20 as part of the resource path.

Here are a few links (spec links are boring, but they have links to them too): http://stackoverflow.com/questions/1634271/url-encoding-the-space-character-or-20 http://stackoverflow.com/questions/497908/is-a-url-allowed-to-contain-a-space

But that's really not here nor there. Basically not encoding the + and letting the server deal with it would be the expected behaviour for query strings.

I've grepped through GDAL:

D:\Work\gdal_1.11_tastle\gdal>findstr /S /c:"CPLES_URL" *.cpp
frmts\wcs\wcsdataset.cpp:    pszEncoded = CPLEscapeString( osCoverage, -1, CPLES_URL );
frmts\wcs\wcsdataset.cpp:    pszEncoded = CPLEscapeString( osFormat, -1, CPLES_URL );
frmts\wms\wmsmetadataset.cpp:    char* pszEscapedLayerName = CPLEscapeString(pszLayerName, -1, CPLES_URL);
ogr\ogrsf_frmts\couchdb\ogrcouchdbdatasource.cpp:        char* pszEscapedName = CPLEscapeString(pszLayerName, -1, CPLES_URL);
ogr\ogrsf_frmts\couchdb\ogrcouchdbdatasource.cpp:    char* pszEscapedName = CPLEscapeString(pszName, -1, CPLES_URL);
ogr\ogrsf_frmts\couchdb\ogrcouchdbdatasource.cpp:    char* pszEscapedName = CPLEscapeString(osLayerName, -1, CPLES_URL);
ogr\ogrsf_frmts\couchdb\ogrcouchdbtablelayer.cpp:    char* pszEscapedName = CPLEscapeString(pszName, -1, CPLES_URL);
ogr\ogrsf_frmts\gme\ogrgmelayer.cpp:            char * pszEscaped = CPLEscapeString(pszWhere, -1, CPLES_URL);
ogr\ogrsf_frmts\gme\ogrgmelayer.cpp:            char * pszEscaped = CPLEscapeString(pszWkt, -1, CPLES_URL);
ogr\ogr_geocoding.cpp:                                                -1, CPLES_URL);
ogr\ogr_geocoding.cpp:                                                -1, CPLES_URL);
ogr\ogr_geocoding.cpp:                                                -1, CPLES_URL);
ogr\ogr_geocoding.cpp:    char* pszEscapedQuery = CPLEscapeString(pszQuery, -1, CPLES_URL);
ogr\wcts\ogrwcts.cpp:                                               NULL, CPLES_URL );
ogr\wcts\wctsclient.cpp:                                               NULL, CPLES_URL );
port\cpl_google_oauth2.cpp:    osScope.Seize(CPLEscapeString(pszScope, -1, CPLES_URL));
port\cpl_vsil_curl.cpp:    char* pszEscaped = CPLEscapeString(pszTarget, -1, CPLES_URL);

and trimmed out unescape calls, and the only place where I see it being used for the resource, is for cpl_google_oauth2 and maybe-ish for cpl_vsil_curl (not sure what that one really does atm). In both of those cases, they're resources and should be %20 encoded if spaces are encountered.

There really should be a distinction between the query encoding and resource encoding, and right now it looks like CPLEscapeString(CPLES_URL) is used predominantly for query encoding.

comment:9 Changed 3 years ago by Even Rouault

Summary: CPLEscapeString (CPLES_URL) encoding some characters unnecessarily[PATCH] CPLEscapeString (CPLES_URL) encoding some characters unnecessarily

comment:10 Changed 21 months ago by tastle

Even, any further thoughts on this? Is there anything I'd need to do to get this issue to advance? I had a pull request issued but it was turfed in lieu of this original trac ticket that I created.

comment:11 Changed 21 months ago by Even Rouault

Milestone: 2.1.0
Resolution: fixed
Status: newclosed

trunk r32863 "Avoid CPLEscapeString (CPLES_URL) encoding some characters unnecessarily (patch by Tim Astle, #5526)"

Note: See TracTickets for help on using tickets.