Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#7154 closed defect (fixed)

vsis3: certificate issue with bucket with dot in the bucket name

Reported by: tveastman Owned by: warmerdam
Priority: normal Milestone:
Component: default Version: unspecified
Severity: normal Keywords: vsis3
Cc: robert.coup@…

Description

Background: Buckets with a . in the filename need to be accessed with AWS_VIRTUAL_HOSTING set to NO, otherwise an SSL error occurs when the client tries to connect, as demonstrated.

In [17]: gdal.SetConfigOption(b'AWS_VIRTUAL_HOSTING', b'YES')
In [18]: gdal.VSICurlClearCache()
In [19]: gdal.ReadDir('/vsis3/bucket.with.dots.in/')

* Couldn't find host bucket.with.dots.in.s3.amazonaws.com in the .netrc file; using defaults
* Hostname was NOT found in DNS cache
*   Trying 52.216.0.232...
* TCP_NODELAY set
* Connected to bucket.with.dots.in.s3.amazonaws.com (52.216.0.232) port 443 (#14)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSL connection using ECDHE-RSA-AES128-GCM-SHA256
* Server certificate:
* 	 subject: C=US; ST=Washington; L=Seattle; O=Amazon.com Inc.; CN=*.s3.amazonaws.com
* 	 start date: 2017-09-22 00:00:00 GMT
* 	 expire date: 2019-01-03 12:00:00 GMT
* 	 subjectAltName does not match bucket.with.dots.in.s3.amazonaws.com
* SSL: no alternative certificate subject name matches target host name 'bucket.with.dots.in.s3.amazonaws.com'
* Closing connection 14

That was expected behaviour, and you work around it by setting AWS_VIRTUAL_HOSTING to YES.

The trouble occurs when the bucket is in a non standard region, and the initial response is a redirect to another region:

In [20]: gdal.SetConfigOption(b'AWS_VIRTUAL_HOSTING', b'NO')
In [21]: gdal.VSICurlClearCache()
In [22]: gdal.ReadDir('/vsis3/bucket.with.dots.in/')

* Couldn't find host s3.amazonaws.com in the .netrc file; using defaults
* Hostname was NOT found in DNS cache
*   Trying 52.216.84.197...
* TCP_NODELAY set
* Connected to s3.amazonaws.com (52.216.84.197) port 443 (#15)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSL connection using ECDHE-RSA-AES128-GCM-SHA256
* Server certificate:
* 	 subject: C=US; ST=Washington; L=Seattle; O=Amazon.com Inc.; CN=s3.amazonaws.com
* 	 start date: 2017-09-26 00:00:00 GMT
* 	 expire date: 2018-09-20 12:00:00 GMT
* 	 subjectAltName: s3.amazonaws.com matched
* 	 issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert Baltimore CA-2 G2
* 	 SSL certificate verify ok.
> GET /bucket.with.dots.in/?delimiter=%2F HTTP/1.1
Host: s3.amazonaws.com
Accept: */*
x-amz-date: 20171120T214123Z
x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Authorization: AWS4-HMAC-SHA256 Credential=AKIAJJA4D44G5LMQYQOQ/20171120/ap-southeast-2/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=8b300ce747c78a0c44d4e8d05318ff57e7a12a2748e46a1abd2767e66c66479a

< HTTP/1.1 301 Moved Permanently
< x-amz-bucket-region: ap-southeast-2
< x-amz-request-id: 33E45A7527EC2159
< x-amz-id-2: xB6JqOQ29pqPJdQkmjnFwAWT0K+oPdtLVYPpjJ6FGJz92Y80Xf8qw8cWSGcvJvevQuZK05cc4uE=
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Mon, 20 Nov 2017 21:41:23 GMT
* Server AmazonS3 is not blacklisted
< Server: AmazonS3
<
* Connection #15 to host s3.amazonaws.com left intact
* Couldn't find host bucket.with.dots.in.s3.amazonaws.com in the .netrc file; using defaults
* Hostname was NOT found in DNS cache
*   Trying 54.231.82.162...
* TCP_NODELAY set
* Connected to bucket.with.dots.in.s3.amazonaws.com (54.231.82.162) port 443 (#16)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSL connection using ECDHE-RSA-AES128-GCM-SHA256
* Server certificate:
* 	 subject: C=US; ST=Washington; L=Seattle; O=Amazon.com Inc.; CN=*.s3.amazonaws.com
* 	 start date: 2017-09-22 00:00:00 GMT
* 	 expire date: 2019-01-03 12:00:00 GMT
* 	 subjectAltName does not match bucket.with.dots.in.s3.amazonaws.com
* SSL: no alternative certificate subject name matches target host name 'bucket.with.dots.in.s3.amazonaws.com'
* Closing connection 16

The second request fails, it looks just like the original example at the top -- a DNS based 'virtual hosted' request, to the wrong region (it should be against the ap-southeast-2 endpoint)

Amazon seems weird about this. the 301 redirect doesn't include a Location: header, but you can infer where the request should go from the x-amz-bucket-region response header.

Change History (6)

comment:1 by Even Rouault, 6 years ago

Summary: vsis3: region redirect causes AWS_VIRTUAL_HOSTING to not be honoured.vsis3: certificate issue with bucket with dot in the bucket name

comment:2 by Even Rouault, 6 years ago

The issue is a certificate issue.

* Server certificate:
* 	 subject: C=US; ST=Washington; L=Seattle; O=Amazon.com Inc.; CN=*.s3.amazonaws.com
* 	 start date: 2017-09-22 00:00:00 GMT
* 	 expire date: 2019-01-03 12:00:00 GMT
* 	 subjectAltName does not match bucket.with.dots.in.s3.amazonaws.com
* SSL: no alternative certificate subject name matches target host name 'bucket.with.dots.in.s3.amazonaws.com'

The non-virtual hosting way ends up being a virtual hosting way through an AWS error, and curl doesn't like the certificate of the virtual host. Boto has also the same issue: https://github.com/boto/boto/issues/2836

The workaround for GDAL is to define GDAL_HTTP_UNSAFESSL=1

And apparently there's no need for GDAL to default to a non-virtual hosting way when the bucket name has a dot in it, since AWS redirects it to a virtual host. Perhaps this has changed since the time of the initial implementation

comment:3 by tveastman, 6 years ago

The AWS certificate issue is what gdal needs to work around. It is a known issue that making an HTTPS call to a virtual hosted S3 bucket with a . in the name results in a mismatched SSL certificate.

For security, using http-only calls or GDAL_HTTP_UNSAFESSL=1 are both insufficient workarounds.

The workaround required, in order to preserve a verifiable SSL call is:

  1. Determine the bucket's region from the x-amz-bucket-region response header.
  2. Redirect the request to https://s3.<REGION>.amazonaws.com/<BUCKET>/ (or s3.amazonaws.com if the region is us-east-1)

At the moment, a GDAL user who needs to interact securely with a bucket that's not in us-east-1 and has a . in the name must send a curl HEAD request to s3.amazonaws.com to determine the region, and then set both AWS_S3_ENDPOINT=https://s3.REGION.amazonaws.com and AWS_VIRTUAL_HOSTING=no

If GDAL is set for HTTPS and is trying to access a bucket with a . in the name, it makes sense for its behaviour to respond appropriately.

comment:4 by Even Rouault, 6 years ago

Resolution: fixed
Status: newclosed

In 40788:

/vsis3/: fix support of bucket names with dot in them (fixes #7154)

comment:5 by Even Rouault, 6 years ago

@tveastman Thanks for the latest explanation. The AWS error message is rather misleading with a inappropriate endpoint suggested...

comment:6 by Robert Coup, 6 years ago

It isn't that S3 SSL is broken, it's more that it's not currently valid to issue wildcard certificates at multiple levels – a *.example.com certificate (like S3 uses) is valid for x.example.com but not for y.x.example.com. There's no way to issue a **.example.com or anything to cover multiple levels without manually specifying them all in the certificate.

Note: See TracTickets for help on using tickets.