Home
deltalake is an open source library that makes working with tabular datasets easier, more robust and more performant. With deltalake you can add, remove or update rows in a dataset as new data arrives. You can time travel back to earlier versions of a dataset. You can optimize dataset storage from small files to large files.
With deltalake you can manage data stored on a local file system or in the cloud. deltalake integrates with data manipulation libraries such as Pandas, Polars, DuckDB and DataFusion.
deltalake uses a lakehouse framework where you manage your datasets with a DeltaTable object and deltalake takes care of the underlying files.
Quick start
-
Install the Python dependencies with
pip:pyarrowandpandasare needed for the DataFrame importtabulateis needed to print the DataFrame in final the example
-
Import the required dependencies:
-
Create a Pandas
DataFrameand write it to aDeltaTable: -
Create a DeltaTable object to track metadata for the Delta table:
-
Overwrite the DataFrame with new data:
-
Easily revert to the original version (version 0) of the table:
-
Confirm the reversion by printing the contents of the table using the Pandasto_markdown() function:
-
Output shows the original data from step 3:
Next steps
- Learn about Querying Delta Tables
- Learn about using
deltalakewith Polars - Learn about using
deltalakewith DataFusion