Skip to content

Writer

Write to Delta Tables

deltalake.write_deltalake

write_deltalake(table_or_uri: str | Path | DeltaTable, data: ArrowStreamExportable | ArrowArrayExportable | Sequence[ArrowArrayExportable], *, partition_by: list[str] | str | None = None, mode: Literal['error', 'append', 'overwrite', 'ignore'] = 'error', name: str | None = None, description: str | None = None, configuration: Mapping[str, str | None] | None = None, schema_mode: Literal['merge', 'overwrite'] | None = None, storage_options: dict[str, str] | None = None, predicate: str | None = None, target_file_size: int | None = None, writer_properties: WriterProperties | None = None, post_commithook_properties: PostCommitHookProperties | None = None, commit_properties: CommitProperties | None = None) -> None

Write to a Delta Lake table

If the table does not already exist, it will be created.

Parameters:

Name Type Description Default
table_or_uri str | Path | DeltaTable

URI of a table or a DeltaTable object.

required
data ArrowStreamExportable | ArrowArrayExportable | Sequence[ArrowArrayExportable]

Data to write. If passing iterable, the schema must also be given.

required
partition_by list[str] | str | None

List of columns to partition the table by. Only required when creating a new table.

None
mode Literal['error', 'append', 'overwrite', 'ignore']

How to handle existing data. Default is to error if table already exists. If 'append', will add new data. If 'overwrite', will replace table with new data. If 'ignore', will not write anything if table already exists.

'error'
name str | None

User-provided identifier for this table.

None
description str | None

User-provided description for this table.

None
configuration Mapping[str, str | None] | None

A map containing configuration options for the metadata action.

None
schema_mode Literal['merge', 'overwrite'] | None

If set to "overwrite", allows replacing the schema of the table. Set to "merge" to merge with existing schema.

None
storage_options dict[str, str] | None

options passed to the native delta filesystem.

None
predicate str | None

When using Overwrite mode, replace data that matches a predicate. Only used in rust engine.'

None
target_file_size int | None

Override for target file size for data files written to the delta table. If not passed, it's taken from delta.targetFileSize.

None
writer_properties WriterProperties | None

Pass writer properties to the Rust parquet writer.

None
post_commithook_properties PostCommitHookProperties | None

properties for the post commit hook. If None, default values are used.

None
commit_properties CommitProperties | None

properties of the transaction commit. If None, default values are used.

None

deltalake.BloomFilterProperties dataclass

BloomFilterProperties(set_bloom_filter_enabled: bool | None, fpp: float | None = None, ndv: int | None = None)

The Bloom Filter Properties instance for the Rust parquet writer.

Create a Bloom Filter Properties instance for the Rust parquet writer:

Parameters:

Name Type Description Default
set_bloom_filter_enabled bool | None

If True and no fpp or ndv are provided, the default values will be used.

required
fpp float | None

The false positive probability for the bloom filter. Must be between 0 and 1 exclusive.

None
ndv int | None

The number of distinct values for the bloom filter.

None

deltalake.ColumnProperties dataclass

ColumnProperties(dictionary_enabled: bool | None = None, statistics_enabled: Literal['NONE', 'CHUNK', 'PAGE'] | None = None, bloom_filter_properties: BloomFilterProperties | None = None)

The Column Properties instance for the Rust parquet writer.

Create a Column Properties instance for the Rust parquet writer:

Parameters:

Name Type Description Default
dictionary_enabled bool | None

Enable dictionary encoding for the column.

None
statistics_enabled Literal['NONE', 'CHUNK', 'PAGE'] | None

Statistics level for the column.

None
bloom_filter_properties BloomFilterProperties | None

Bloom Filter Properties for the column.

None

deltalake.WriterProperties dataclass

WriterProperties(data_page_size_limit: int | None = None, dictionary_page_size_limit: int | None = None, data_page_row_count_limit: int | None = None, write_batch_size: int | None = None, max_row_group_size: int | None = None, compression: Literal['UNCOMPRESSED', 'SNAPPY', 'GZIP', 'BROTLI', 'LZ4', 'ZSTD', 'LZ4_RAW'] | None = None, compression_level: int | None = None, statistics_truncate_length: int | None = None, default_column_properties: ColumnProperties | None = None, column_properties: dict[str, ColumnProperties] | None = None)

A Writer Properties instance for the Rust parquet writer.

Create a Writer Properties instance for the Rust parquet writer:

Parameters:

Name Type Description Default
data_page_size_limit int | None

Limit DataPage size to this in bytes.

None
dictionary_page_size_limit int | None

Limit the size of each DataPage to store dicts to this amount in bytes.

None
data_page_row_count_limit int | None

Limit the number of rows in each DataPage.

None
write_batch_size int | None

Splits internally to smaller batch size.

None
max_row_group_size int | None

Max number of rows in row group.

None
compression Literal['UNCOMPRESSED', 'SNAPPY', 'GZIP', 'BROTLI', 'LZ4', 'ZSTD', 'LZ4_RAW'] | None

compression type.

None
compression_level int | None

If none and compression has a level, the default level will be used, only relevant for GZIP: levels (1-9), BROTLI: levels (1-11), ZSTD: levels (1-22),

None
statistics_truncate_length int | None

maximum length of truncated min/max values in statistics.

None
default_column_properties ColumnProperties | None

Default Column Properties for the Rust parquet writer.

None
column_properties dict[str, ColumnProperties] | None

Column Properties for the Rust parquet writer.

None

Convert to Delta Tables

deltalake.convert_to_deltalake

convert_to_deltalake(uri: str | Path, mode: Literal['error', 'ignore'] = 'error', partition_by: Schema | None = None, partition_strategy: Literal['hive'] | None = None, name: str | None = None, description: str | None = None, configuration: Mapping[str, str | None] | None = None, storage_options: dict[str, str] | None = None, commit_properties: CommitProperties | None = None, post_commithook_properties: PostCommitHookProperties | None = None) -> None

Convert parquet tables to delta tables.

Currently only HIVE partitioned tables are supported. Convert to delta creates a transaction log commit with add actions, and additional properties provided such as configuration, name, and description.

Parameters:

Name Type Description Default
uri str | Path

URI of a table.

required
partition_by Schema | None

Optional partitioning schema if table is partitioned.

None
partition_strategy Literal['hive'] | None

Optional partition strategy to read and convert

None
mode Literal['error', 'ignore']

How to handle existing data. Default is to error if table already exists. If 'ignore', will not convert anything if table already exists.

'error'
name str | None

User-provided identifier for this table.

None
description str | None

User-provided description for this table.

None
configuration Mapping[str, str | None] | None

A map containing configuration options for the metadata action.

None
storage_options dict[str, str] | None

options passed to the native delta filesystem. Unused if 'filesystem' is defined.

None
commit_properties CommitProperties | None

properties of the transaction commit. If None, default values are used.

None
post_commithook_properties PostCommitHookProperties | None

properties for the post commit hook. If None, default values are used.

None