DeltaLake 1.0.0 Migration Guide
DeltaLake 1.0.0 introduces significant changes, including the removal of the legacy Python writer engine. All write operations are now delegated to the Rust
engine, which provides enhanced performance via streaming execution
and multipart uploads
for improved concurrency and reduced memory usage.
In addition, DeltaLake no longer has a hard dependency on PyArrow, making it more lightweight and suitable for use in minimal or size-constrained environments. Arrow data handling between Python and Rust is now powered by arro3
, a lightweight Arrow implementation backed by arrow-rs
.
For users who still require PyArrow-based read functionality, it remains available as an optional dependency. To continue using these features, install DeltaLake with the pyarrow
extra, deltalake[pyarrow]
.
Breaking changes
arro3
adoption
DeltaLake 1.0.0 introduces support for Arrow PyCapsule protocol using a lightweight arrow implementation with arro3
. As a result, the write_deltalake
and DeltaTable.merge
functions now only accept inputs that implement one of the following interfaces:
ArrowStreamExportable
ArrowArrayExportable
This means you can directly pass data from compatible libraries, such as:
- Polars DataFrame
- PyArrow Table or RecordBatchReader
- Pandas DataFrame (via its Arrow export support)
DeltaLake will consume these objects efficiently through the Arrow C Data Interface.
Schema, Types and Fields
Schema handling has also transitioned from PyArrow to arro3
. Internally, schema, types, and fields are now defined and interpreted via the Arrow C Data Interface:
Our schema code also no longer converts to pyarrow but converts to arro3
schema/types/fields directly over Arrow C Data Interface.
to_pyarrow
→ replaced withto_arrow
from_pyarrow
-> replaced withfrom_arrow
(can accept any object implementingArrowSchemaExportable
)
All DeltaLake schema components—such as schema, field, and type objects—now implement the __arrow_c_schema__
protocol. This means they are directly consumable by any library that supports the Arrow C Data Interface (e.g., Polars, PyArrow, pandas).
Arrow Object Outputs
Several DeltaLake APIs now return arro3-based Arrow objects instead of PyArrow objects. This includes:
DeltaTable.get_add_actions
DeltaTable.load_cdf
QueryBuilder.execute
These returned objects are fully compatible with libraries like Polars and PyArrow. For example:
import polars as pl
import pyarrow as pa
dt = DeltaTable("test")
rb = dt.load_df()
# With polars
df = pl.DataFrame(rb)
# With pyarrow
tbl = pa.table(df)
create
API
The create
API now accepts schema definitions as either:
- A
DeltaSchema
object, or - Any object implementing the
ArrowSchemaExportable
protocol (e.g., a PyArrow schema)
This ensures full interoperability with modern Arrow-compatible libraries while leveraging the performance and flexibility of the Arrow C Data Interface.
write_deltalake
API
engine
parameter removed
The engine parameter has been removed. All writes now default to the Rust engine.
Before:
After:
PyArrow-specific parameters removed
The following parameters related to the deprecated Python engine are no longer supported:
file_options: Optional[ds.ParquetFileWriteOptions]
max_partitions: Optional[int]
max_open_files: int
max_rows_per_file: int
min_rows_per_group: int
max_rows_per_group: int
partition_filters
removed
The partition_filters parameter has been removed. Use the predicate
parameter as an alternative for overwriting a subset during writes
large_dtypes
removed
The large_dtypes
parameter has been removed. DeltaLake now always passes through all Arrow data types without modification for both write_deltalake
and DeltaTable.merge
.
QueryBuilder
QueryBuilder no longer implements show
, and execute
will directly return a RecordBatchReader. Additionally the experimental flag has been removed.
CommitProperties
custom_metadata
replaced with commit_properties
The custom_metadata argument has been replaced by the commit_properties parameter on the following APIs:
- convert_to_deltalake
- DeltaTable
- create
- vacuum
- update
- merge
- restore
- repair
- add_columns
- add_constraint
- set_table_properties
- compact
- z_order
- delete
Before:
After:
convert_to_deltalake(
uri="mytable",
commit_properties=CommitProperties(custom_metadata={"foo":"bar"}),
)
DeltaTable
removed from_data_catalog
This method was previously unimplemented and has now been fully removed from the DeltaTable class.
removed get_earliest_version
This method is potentially very expensive while we could not find a clear use case for it. If you have a specific use case, please open an issue on GitHub.
transaction_versions
changed to transaction_version
transaction_versions
has been renamed to transaction_version
and now returns a single version
number for a specific application instead of a dictionary for all applications. This allows
for internal optimisations and aligns more with the use case for idempotent writes.
To store more complex state as part of the delta log, use domainMetadata
.
removed get_num_index_cols
, get_stats_columns
and check_can_write_timestamp_ntz
.
These methods were rxposed for use by the pyarrow
engine, which no longer exists.
deprecated files
The files
method has been deprecated and will be removed in a future release. Use file_uris
instead.
The relative paths reported by files
cannot properly be interpreted when using features like shallow clones.
file_uris
reposrts absolute paths that can handle a more dynamic path resolution.
Internal changes
WriterProperties
, ColumnProperties
and BloomFilterProperties
moved to deltalake.writer.properties
Can be imported directly with:
AddAction
, CommitProperties
and PostCommithookProperties
moved to deltalake.transaction
Can be imported directly with:
New features
Public APIs for transaction management
Functionality previously limited to internal PyArrow writer methods is now publicly available:
write_new_deltalake
has been renamed tocreate_table_with_add_actions
and is now exposed under deltalake.transaction.- You can now initiate a write transaction on an existing table using:
DeltaTable.create_write_transaction(...)
Behavior changes with unsupported delta features ⚠️
Previously, it was possible to work with Delta tables that included unsupported features (e.g., Deletion Vectors) as long as those features were not actively used during an operation/interaction.
As of version 1.0.0, this is no longer allowed. DeltaLake will now raise an error if it detects unsupported features in a table. This change ensures safety and avoids inconsistencies stemming from partial or undefined behavior.