Skip to content

Home

deltalake is an open source library that makes working with tabular datasets easier, more robust and more performant. With deltalake you can add, remove or update rows in a dataset as new data arrives. You can time travel back to earlier versions of a dataset. You can optimize dataset storage from small files to large files.

With deltalake you can manage data stored on a local file system or in the cloud. deltalake integrates with data manipulation libraries such as Pandas, Polars, DuckDB and DataFusion.

deltalake uses a lakehouse framework where you manage your datasets with a DeltaTable object and deltalake takes care of the underlying files.

Quick start

  1. Install the Python dependencies with pip:

    pip install deltalake pandas pyarrow tabulate
    
    • pyarrow and pandas are needed for the DataFrame import
    • tabulate is needed to print the DataFrame in final the example
  2. Import the required dependencies:

    from deltalake import write_deltalake, DeltaTable
    import pandas as pd
    
  3. Create a Pandas DataFrame and write it to a DeltaTable:

    df = pd.DataFrame({"num": [8, 9], "letter": ["aa", "bb"]})
    write_deltalake("tmp/some-table", df)
    
  4. Create a DeltaTable object to track metadata for the Delta table:

    dt = DeltaTable("tmp/some-table")
    
  5. Overwrite the DataFrame with new data:

    df = pd.DataFrame({"num": [11, 22], "letter": ["dd", "ee"]})
    write_deltalake("tmp/some-table", df, mode="overwrite")
    
  6. Easily revert to the original version (version 0) of the table:

    df = DeltaTable("tmp/some-table", version=0)
    
  7. Confirm the reversion by printing the contents of the table using the Pandasto_markdown() function:

    print(df.to_pandas().to_markdown())
    
  8. Output shows the original data from step 3:

    |    |   num | letter   |
    |---:|------:|:---------|
    |  0 |     8 | aa       |
    |  1 |     9 | bb       |
    

Next steps