Skip to content

AWS S3 Storage Backend

delta-rs offers native support for using AWS S3 as an object storage backend.

You don’t need to install any extra dependencies to read/write Delta tables to S3 with engines that use delta-rs. You do need to configure your AWS access credentials correctly.

Note for boto3 users

Many Python engines use boto3 to connect to AWS. This library supports reading credentials automatically from your local .aws/config or .aws/creds file.

For example, if you’re running locally with the proper credentials in your local .aws/config or .aws/creds file then you can write a Parquet file to S3 like this with pandas:

    import pandas as pd
    df = pd.DataFrame({'x': [1, 2, 3]})
    df.to_parquet("s3://avriiil/parquet-test-pandas")

The delta-rs writer does not use boto3 and therefore does not support taking credentials from your .aws/config or .aws/creds file. If you’re used to working with writers from Python engines like Polars, pandas or Dask, this may mean a small change to your workflow.

Passing AWS Credentials

You can pass your AWS credentials explicitly by using:

  • the storage_optionskwarg
  • Environment variables
  • EC2 metadata if using EC2 instances
  • AWS Profiles

Example

Let's work through an example with Polars. The same logic applies to other Python engines like Pandas, Daft, Dask, etc.

Follow the steps below to use Delta Lake on S3 with Polars:

  1. Install Polars and deltalake. For example, using:

pip install polars deltalake

  1. Create a dataframe with some toy data.

df = pl.DataFrame({'x': [1, 2, 3]})

  1. Set your storage_options correctly.
storage_options = {
    "AWS_REGION":<region_name>,
    'AWS_ACCESS_KEY_ID': <key_id>,
    'AWS_SECRET_ACCESS_KEY': <access_key>,
    'AWS_S3_LOCKING_PROVIDER': 'dynamodb',
    'DELTA_DYNAMO_TABLE_NAME': 'delta_log',
}
  1. Write data to Delta table using the storage_options kwarg.
df.write_delta(
    "s3://bucket/delta_table",
    storage_options=storage_options,
)

Delta Lake on AWS S3: Safe Concurrent Writes

You need a locking provider to ensure safe concurrent writes when writing Delta tables to AWS S3. This is because AWS S3 does not guarantee mutual exclusion.

A locking provider guarantees that only one writer is able to create the same file. This prevents corrupted or conflicting data.

delta-rs uses DynamoDB to guarantee safe concurrent writes.

Run the code below in your terminal to create a DynamoDB table that will act as your locking provider.

    aws dynamodb create-table \
    --table-name delta_log \
    --attribute-definitions AttributeName=tablePath,AttributeType=S AttributeName=fileName,AttributeType=S \
    --key-schema AttributeName=tablePath,KeyType=HASH AttributeName=fileName,KeyType=RANGE \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

If for some reason you don't want to use DynamoDB as your locking mechanism you can choose to set the AWS_S3_ALLOW_UNSAFE_RENAME variable to true in order to enable S3 unsafe writes.

Read more in the Usage section.

Delta Lake on AWS S3: Required permissions

You need to have permissions to get, put and delete objects in the S3 bucket you're storing your data in. Please note that you must be allowed to delete objects even if you're just appending to the Delta Lake, because there are temporary files into the log folder that are deleted after usage.

In AWS S3, you will need the following permissions:

  • s3:GetObject
  • s3:PutObject
  • s3:DeleteObject

In DynamoDB, you will need the following permissions:

  • dynamodb:GetItem
  • dynamodb:Query
  • dynamodb:PutItem
  • dynamodb:UpdateItem

Configuration Reference

The following table lists all available configuration options that can be passed via the storage_options parameter when working with AWS S3. These options correspond to the AmazonS3ConfigKey enum from the object_store crate.

Configuration Key Environment Variable Description
access_key_id AWS_ACCESS_KEY_ID AWS access key ID for authentication
secret_access_key AWS_SECRET_ACCESS_KEY AWS secret access key for authentication
region AWS_REGION or AWS_DEFAULT_REGION AWS region where the S3 bucket is located
endpoint AWS_ENDPOINT_URL Custom S3 endpoint URL (for S3-compatible services like MinIO, LocalStack)
token AWS_SESSION_TOKEN Session token for temporary credentials (STS)
imdsv1_fallback AWS_EC2_METADATA_V1_DISABLED Allow IMDSv1 fallback for EC2 metadata (set to true to disable)
virtual_hosted_style_request AWS_VIRTUAL_HOSTED_STYLE_REQUEST Use virtual hosted-style requests (true) or path-style (false)
aws_unsigned_payload - Skip payload signing for requests (set to true for unsigned uploads)
aws_checksum_algorithm - Checksum algorithm to use (e.g., sha256)
aws_metadata_endpoint AWS_EC2_METADATA_SERVICE_ENDPOINT EC2 metadata service endpoint URL
aws_container_credentials_relative_uri AWS_CONTAINER_CREDENTIALS_RELATIVE_URI URI for container credentials (ECS tasks)
aws_copy_if_not_exists - How to handle copy-if-not-exists operations
aws_conditional_put - Conditional put support mode (e.g., etag for S3-compatible stores)
aws_skip_signature - Skip request signing entirely (set to true for anonymous access)
aws_disable_tagging - Disable object tagging (set to true if not supported)
aws_s3_express - Enable S3 Express One Zone support
aws_request_payer - Request payer setting (for requester-pays buckets)
aws_web_identity_token_file AWS_WEB_IDENTITY_TOKEN_FILE Path to web identity token file for OIDC authentication
aws_role_arn AWS_ROLE_ARN IAM role ARN to assume via STS AssumeRole
aws_role_session_name AWS_ROLE_SESSION_NAME Session name for role assumption
aws_sts_endpoint - Custom STS endpoint URL

Delta Lake Specific Options

In addition to the standard S3 configuration options above, delta-rs provides these specific settings:

Configuration Key Environment Variable Description
AWS_S3_LOCKING_PROVIDER AWS_S3_LOCKING_PROVIDER Locking mechanism for safe concurrent writes (set to dynamodb)
DELTA_DYNAMO_TABLE_NAME DELTA_DYNAMO_TABLE_NAME DynamoDB table name for lock management
AWS_S3_ALLOW_UNSAFE_RENAME AWS_S3_ALLOW_UNSAFE_RENAME Allow unsafe writes without locking (set to true to skip locking - not recommended for production)

Supported URL Schemes

Delta Lake on S3 supports the following URL schemes:

  • s3://bucket-name/path/to/table - Standard S3 URL
  • s3a://bucket-name/path/to/table - Hadoop S3A scheme
  • https://s3.<region>.amazonaws.com/bucket-name/path/to/table - HTTPS path-style URL
  • https://bucket-name.s3.<region>.amazonaws.com/path/to/table - HTTPS virtual hosted-style URL

Note

For the complete and authoritative list of configuration options, refer to the object_store AmazonS3ConfigKey documentation.