Skip to content

Writer

Write to Delta Tables

deltalake.write_deltalake

write_deltalake(table_or_uri: Union[str, Path, DeltaTable], data: Union[pd.DataFrame, ds.Dataset, pa.Table, pa.RecordBatch, Iterable[pa.RecordBatch], RecordBatchReader, ArrowStreamExportable], *, schema: Optional[Union[pa.Schema, DeltaSchema]] = None, partition_by: Optional[Union[List[str], str]] = None, mode: Literal['error', 'append', 'overwrite', 'ignore'] = 'error', file_options: Optional[ds.ParquetFileWriteOptions] = None, max_partitions: Optional[int] = None, max_open_files: int = 1024, max_rows_per_file: int = 10 * 1024 * 1024, min_rows_per_group: int = 64 * 1024, max_rows_per_group: int = 128 * 1024, name: Optional[str] = None, description: Optional[str] = None, configuration: Optional[Mapping[str, Optional[str]]] = None, schema_mode: Optional[Literal['merge', 'overwrite']] = None, storage_options: Optional[Dict[str, str]] = None, partition_filters: Optional[List[Tuple[str, str, Any]]] = None, predicate: Optional[str] = None, target_file_size: Optional[int] = None, large_dtypes: bool = False, engine: Literal['pyarrow', 'rust'] = 'rust', writer_properties: Optional[WriterProperties] = None, custom_metadata: Optional[Dict[str, str]] = None, post_commithook_properties: Optional[PostCommitHookProperties] = None, commit_properties: Optional[CommitProperties] = None) -> None

Write to a Delta Lake table

If the table does not already exist, it will be created.

The pyarrow writer supports protocol version 2 currently and won't be updated. For higher protocol support use engine='rust', this will become the default eventually.

To enable safe concurrent writes when writing to S3, an additional locking mechanism must be supplied. For more information on enabling concurrent writing to S3, follow this guide

Parameters:

Name Type Description Default
table_or_uri Union[str, Path, DeltaTable]

URI of a table or a DeltaTable object.

required
data Union[DataFrame, Dataset, Table, RecordBatch, Iterable[RecordBatch], RecordBatchReader, ArrowStreamExportable]

Data to write. If passing iterable, the schema must also be given.

required
schema Optional[Union[Schema, Schema]]

Optional schema to write.

None
partition_by Optional[Union[List[str], str]]

List of columns to partition the table by. Only required when creating a new table.

None
mode Literal['error', 'append', 'overwrite', 'ignore']

How to handle existing data. Default is to error if table already exists. If 'append', will add new data. If 'overwrite', will replace table with new data. If 'ignore', will not write anything if table already exists.

'error'
file_options Optional[ParquetFileWriteOptions]

Optional write options for Parquet (ParquetFileWriteOptions). Can be provided with defaults using ParquetFileWriteOptions().make_write_options(). Please refer to https://github.com/apache/arrow/blob/master/python/pyarrow/_dataset_parquet.pyx#L492-L533 for the list of available options. Only used in pyarrow engine.

None
max_partitions Optional[int]

the maximum number of partitions that will be used. Only used in pyarrow engine.

None
max_open_files int

Limits the maximum number of files that can be left open while writing. If an attempt is made to open too many files then the least recently used file will be closed. If this setting is set too low you may end up fragmenting your data into many small files. Only used in pyarrow engine.

1024
max_rows_per_file int

Maximum number of rows per file. If greater than 0 then this will limit how many rows are placed in any single file. Otherwise there will be no limit and one file will be created in each output directory unless files need to be closed to respect max_open_files min_rows_per_group: Minimum number of rows per group. When the value is set, the dataset writer will batch incoming data and only write the row groups to the disk when sufficient rows have accumulated. Only used in pyarrow engine.

10 * 1024 * 1024
max_rows_per_group int

Maximum number of rows per group. If the value is set, then the dataset writer may split up large incoming batches into multiple row groups. If this value is set, then min_rows_per_group should also be set.

128 * 1024
name Optional[str]

User-provided identifier for this table.

None
description Optional[str]

User-provided description for this table.

None
configuration Optional[Mapping[str, Optional[str]]]

A map containing configuration options for the metadata action.

None
schema_mode Optional[Literal['merge', 'overwrite']]

If set to "overwrite", allows replacing the schema of the table. Set to "merge" to merge with existing schema.

None
storage_options Optional[Dict[str, str]]

options passed to the native delta filesystem.

None
predicate Optional[str]

When using Overwrite mode, replace data that matches a predicate. Only used in rust engine.'

None
target_file_size Optional[int]

Override for target file size for data files written to the delta table. If not passed, it's taken from delta.targetFileSize.

None
partition_filters Optional[List[Tuple[str, str, Any]]]

the partition filters that will be used for partition overwrite. Only used in pyarrow engine.

None
large_dtypes bool

Only used for pyarrow engine

False
engine Literal['pyarrow', 'rust']

writer engine to write the delta table. PyArrow engine is deprecated, and will be removed in v1.0.

'rust'
writer_properties Optional[WriterProperties]

Pass writer properties to the Rust parquet writer.

None
custom_metadata Optional[Dict[str, str]]

Deprecated and will be removed in future versions. Use commit_properties instead.

None
post_commithook_properties Optional[PostCommitHookProperties]

properties for the post commit hook. If None, default values are used.

None
commit_properties Optional[CommitProperties]

properties of the transaction commit. If None, default values are used.

None

deltalake.BloomFilterProperties dataclass

BloomFilterProperties(set_bloom_filter_enabled: Optional[bool], fpp: Optional[float] = None, ndv: Optional[int] = None)

The Bloom Filter Properties instance for the Rust parquet writer.

Create a Bloom Filter Properties instance for the Rust parquet writer:

Parameters:

Name Type Description Default
set_bloom_filter_enabled Optional[bool]

If True and no fpp or ndv are provided, the default values will be used.

required
fpp Optional[float]

The false positive probability for the bloom filter. Must be between 0 and 1 exclusive.

None
ndv Optional[int]

The number of distinct values for the bloom filter.

None

deltalake.ColumnProperties dataclass

ColumnProperties(dictionary_enabled: Optional[bool] = None, max_statistics_size: Optional[int] = None, bloom_filter_properties: Optional[BloomFilterProperties] = None)

The Column Properties instance for the Rust parquet writer.

Create a Column Properties instance for the Rust parquet writer:

Parameters:

Name Type Description Default
dictionary_enabled Optional[bool]

Enable dictionary encoding for the column.

None
max_statistics_size Optional[int]

Maximum size of statistics for the column.

None
bloom_filter_properties Optional[BloomFilterProperties]

Bloom Filter Properties for the column.

None

deltalake.WriterProperties dataclass

WriterProperties(data_page_size_limit: Optional[int] = None, dictionary_page_size_limit: Optional[int] = None, data_page_row_count_limit: Optional[int] = None, write_batch_size: Optional[int] = None, max_row_group_size: Optional[int] = None, compression: Optional[Literal['UNCOMPRESSED', 'SNAPPY', 'GZIP', 'BROTLI', 'LZ4', 'ZSTD', 'LZ4_RAW']] = None, compression_level: Optional[int] = None, statistics_truncate_length: Optional[int] = None, default_column_properties: Optional[ColumnProperties] = None, column_properties: Optional[Dict[str, ColumnProperties]] = None)

A Writer Properties instance for the Rust parquet writer.

Create a Writer Properties instance for the Rust parquet writer:

Parameters:

Name Type Description Default
data_page_size_limit Optional[int]

Limit DataPage size to this in bytes.

None
dictionary_page_size_limit Optional[int]

Limit the size of each DataPage to store dicts to this amount in bytes.

None
data_page_row_count_limit Optional[int]

Limit the number of rows in each DataPage.

None
write_batch_size Optional[int]

Splits internally to smaller batch size.

None
max_row_group_size Optional[int]

Max number of rows in row group.

None
compression Optional[Literal['UNCOMPRESSED', 'SNAPPY', 'GZIP', 'BROTLI', 'LZ4', 'ZSTD', 'LZ4_RAW']]

compression type.

None
compression_level Optional[int]

If none and compression has a level, the default level will be used, only relevant for GZIP: levels (1-9), BROTLI: levels (1-11), ZSTD: levels (1-22),

None
statistics_truncate_length Optional[int]

maximum length of truncated min/max values in statistics.

None
default_column_properties Optional[ColumnProperties]

Default Column Properties for the Rust parquet writer.

None
column_properties Optional[Dict[str, ColumnProperties]]

Column Properties for the Rust parquet writer.

None

Convert to Delta Tables

deltalake.convert_to_deltalake

convert_to_deltalake(uri: Union[str, Path], mode: Literal['error', 'ignore'] = 'error', partition_by: Optional[pa.Schema] = None, partition_strategy: Optional[Literal['hive']] = None, name: Optional[str] = None, description: Optional[str] = None, configuration: Optional[Mapping[str, Optional[str]]] = None, storage_options: Optional[Dict[str, str]] = None, custom_metadata: Optional[Dict[str, str]] = None) -> None

Convert parquet tables to delta tables.

Currently only HIVE partitioned tables are supported. Convert to delta creates a transaction log commit with add actions, and additional properties provided such as configuration, name, and description.

Parameters:

Name Type Description Default
uri Union[str, Path]

URI of a table.

required
partition_by Optional[Schema]

Optional partitioning schema if table is partitioned.

None
partition_strategy Optional[Literal['hive']]

Optional partition strategy to read and convert

None
mode Literal['error', 'ignore']

How to handle existing data. Default is to error if table already exists. If 'ignore', will not convert anything if table already exists.

'error'
name Optional[str]

User-provided identifier for this table.

None
description Optional[str]

User-provided description for this table.

None
configuration Optional[Mapping[str, Optional[str]]]

A map containing configuration options for the metadata action.

None
storage_options Optional[Dict[str, str]]

options passed to the native delta filesystem. Unused if 'filesystem' is defined.

None
custom_metadata Optional[Dict[str, str]]

custom metadata that will be added to the transaction commit

None