Skip to content

Writing a Faster S3 URL Signer

Posted on:September 27, 2023 at 08:00 PM

Table of contents

Open Table of contents

TL;DR

After facing performance challenges with the AWS SDK for signing S3 URLs, We built a custom S3 URL signer that is 10x faster than the Boto3 signer. The library is open sourced and available at this Github link

The Backstory

In one of our high throughput production APIs, we added a new feature that would compute a high number of signed URLs and send them in the response, approximately in 100 URLs per API call. After releasing the feature, our servers’ CPU utilization started touching almost 100% during the peak hours, there was a 2x spike in latency and the servers became unresponsive.

After investigation, we saw that the servers were spending most of the time in signing the URLs.

The S3 signing process

The S3 signed urls can be used to share S3 data (such as images, text, etc) with others securely. The signed URLs are valid for a limited time period and can be used to access the S3 data without any AWS credentials. The signed URLs are generated using the AWS access key and secret key.

The S3 signing process runs locally on the CPU without the need for network calls.

Here’s an example that uses boto3 library to sign a URL:

boto_signer.py
import boto3

class BotoSigner:
    def __init__(self, region, access_key, secret_key):
        self.service = 's3'
        self.s3_client = self.__get_s3_client(region=region, access_key=access_key, secret_key=secret_key)

    def __get_s3_client(self, region, access_key, secret_key):
        config = {
            'region_name': region,
            'aws_access_key_id': access_key,
            'aws_secret_access_key': secret_key,
        }
        return boto3.resource(
            's3',
            region_name=region,
            aws_access_key_id=access_key,
            aws_secret_access_key=secret_key,
        )

    def generate_signed_url(
        self, key: str, bucket_name: str = None, expiry_seconds: int = 3600
    ) -> str:
        return self.s3_client.meta.client.generate_presigned_url(
            ClientMethod='get_object',
            Params={'Bucket': bucket_name, 'Key': key},
            ExpiresIn=expiry_seconds,
        )

Example Usage of the boto signer:

signer = BotoSigner(
  region="ap-south-1", access_key="access-key", secret_key="secret-key"
)

url = signer.generate_signed_url("my_bucket", "my_object_key")

We were running into performance issues with the above signing method when used in high throughput environments.

Finding out the solutions

Our first instinct was to cache these URLs in a persistance storage like Redis cache, or database store. However, the Expiry duration can be set to a maximum of 1 week; This meant we would have to run a job to sign the URLs periodically. We were looking for alternatives.

The problem of early expiry can be solved by using AWS CloudFront signed URLs which support arbitrary expiry dates [2]. We could create a signed URL using Cloudfront and store it in the DB without having to refresh the values. We didn’t go ahead with this solution as it required adding another component to the system.

Also, the above mentioned solutions are workarounds to the problem, the real issue lies in the signing process itself.

While looking for alternatives, we came across a blog post that had solved this problem by creating their own S3 URL signer [1]. We decided to explore this approach further.

Building a custom S3 URL signer

AWS provides documentation on the signing process [3]. Using the steps mentioned there, plus using reading the blog post [2] that implements an URL signer in Ruby, we created a custom S3 URL signer in Python. The library has been open sourced, and is available at Github Link.

fast_s3_url_signer.py
import datetime
import hashlib
import hmac
import urllib.parse


class FastS3UrlSigner:
    """
    A performant version with plain vanilla implementation of AWS S3 URL signing; no dependencies are required.
    Signs the URLs with AWS SigV4 signing process

    The impl taken from here: https://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html
    """

    def __init__(self, region, access_key, secret_key):
        self.service = 's3'

        self.region = str(region).lower()
        self.access_key = access_key
        self.secret_key = secret_key

    def generate_signed_url(
        self, bucket_name: str, object_key: str, expiry_in_seconds=3600
    ):
        return self.__get_presigned_url(
            bucket_name=bucket_name,
            object_key=object_key,
            method_name='GET',
            expiry_in_seconds=expiry_in_seconds,
        )

    def generate_signed_put_url(
        self, bucket_name: str, object_key: str, expiry_in_seconds=3600
    ):
        return self.__get_presigned_url(
            bucket_name=bucket_name,
            object_key=object_key,
            method_name='PUT',
            expiry_in_seconds=expiry_in_seconds,
        )

    def __get_host(self, bucket_name, region):
        if region == "us-east-1":
            return f"{bucket_name}.s3.amazonaws.com"
        return f"{bucket_name}.s3.{region}.amazonaws.com"

    def __sign(self, key, msg):
        return hmac.new(key, msg.encode('utf-8'), hashlib.sha256).digest()

    def __get_signature_key(self, key, date_stamp, region_name, service_name):
        k_date = self.__sign(('AWS4' + key).encode('utf-8'), date_stamp)
        k_region = self.__sign(k_date, region_name)
        k_service = self.__sign(k_region, service_name)
        k_signing = self.__sign(k_service, 'aws4_request')
        return k_signing

    def __get_presigned_url(
        self,
        bucket_name: str,
        object_key: str,
        method_name: str,
        expiry_in_seconds: int,
    ):
        host = self.__get_host(bucket_name=bucket_name, region=self.region)
        _object_key = urllib.parse.quote(object_key)
        expiry_in_seconds = expiry_in_seconds

        t = datetime.datetime.utcnow()
        amz_date = t.strftime(
            '%Y%m%dT%H%M%SZ'
        )  # Format date as YYYYMMDD'T'HHMMSS'Z'
        datestamp = t.strftime(
            '%Y%m%d'
        )  # Date w/o time, used in credential scope
        canonical_uri = '/' + _object_key
        canonical_headers = 'host:' + host + '\n'
        signed_headers = 'host'

        # Match the algorithm to the hashing algorithm you use, either SHA-1 or
        # SHA-256 (recommended)
        algorithm = 'AWS4-HMAC-SHA256'
        credential_scope = (
            datestamp
            + '/'
            + self.region
            + '/'
            + self.service
            + '/'
            + 'aws4_request'
        )

        canonical_querystring = ''
        canonical_querystring += 'X-Amz-Algorithm=AWS4-HMAC-SHA256'
        canonical_querystring += '&X-Amz-Credential=' + urllib.parse.quote_plus(
            self.access_key + '/' + credential_scope
        )
        canonical_querystring += '&X-Amz-Date=' + amz_date
        canonical_querystring += '&X-Amz-Expires=' + str(expiry_in_seconds)
        canonical_querystring += '&X-Amz-SignedHeaders=' + signed_headers

        canonical_request = (
            method_name
            + '\n'
            + canonical_uri
            + '\n'
            + canonical_querystring
            + '\n'
            + canonical_headers
            + '\n'
            + signed_headers
            + '\n'
            + 'UNSIGNED-PAYLOAD'
        )

        string_to_sign = (
            algorithm
            + '\n'
            + amz_date
            + '\n'
            + credential_scope
            + '\n'
            + hashlib.sha256(canonical_request.encode('utf-8')).hexdigest()
        )

        signing_key = self.__get_signature_key(
            self.secret_key, datestamp, self.region, self.service
        )

        signature = hmac.new(
            signing_key, string_to_sign.encode("utf-8"), hashlib.sha256
        ).hexdigest()

        canonical_querystring += '&X-Amz-Signature=' + signature

        request_url = (
            "https://" + host + canonical_uri + "?" + canonical_querystring
        )
        return request_url

Example usage of the S3 signer:

# Create an instance of the signer object
signer = FastS3UrlSigner(
  region="ap-south-1", access_key="access-key", secret_key="secret_key"
)

# Generate a presigned GET URL with expiry of 2 hours
url = signer.generate_signed_url(
  bucket_name="sample-bucket", object_key="sample-key", expiry_in_seconds=7200
)

# Generate a presigned PUT URL with default expiry (3600s or 1 hour)
put_url = signer.generate_signed_put_url(
  bucket_name="sample-bucket", object_key="sample-key"
)

Benchmarks

Now that we have both the URL signers, let’s compare the performance.

benchmark.py
import timeit

def benchmark(n):
    boto_signer = BotoSigner(
        region="ap-south-1", access_key="access-key", secret_key="secret-key"
    )
    time_taken_boto = timeit.timeit(
        lambda: boto_signer.generate_signed_url(
            bucket_name="bucket_name", key="object_key"
        ),
        number=n,
    )

    fast_s3_url_signer = FastS3UrlSigner(
        region="ap-south-1", access_key="access-key", secret_key="secret-key"
    )
    time_taken_f = timeit.timeit(
        lambda: fast_s3_url_signer.generate_signed_url(
            bucket_name="bucket_name", object_key="object_key"
        ),
        number=n,
    )

    print(f"########  Test run with n = {n}      #############\n")
    print(f"A: Time taken by boto:               {round(time_taken_boto, 2)}s")
    print(f"B: Time taken by fast_s3_url_signer: {round(time_taken_f, 2)}s")
    print(
      f"A/B Ratio:                           {round(time_taken_boto / time_taken_f, 2)}x"
    )
    print()

print("Benchmarking...")
benchmark(n=100)
benchmark(n=1000)
benchmark(n=10000)
benchmark(n=100000)
print("Done.")

Benchmark results showing a 10x improvement

another url

We see a 10x improvement in the performance of the custom signer! The tests were performed on M1 Macbook Pro laptop, but the similar results were observed on the production servers as well.

Conclusion

The Custom Signer has really served us well and it’s extensively being used in almost all of the S3 signing use cases. Optimizing systems isn’t always about major overhauls or complex architectural changes. Sometimes, looking into the minute details, like how we sign a URL, can lead to significant performance gains. Our journey with the S3 signer is a testament to that.

References