Skip to content

LakeFS

delta-rs offers native support for using LakeFS as an object storage backend. Each deltalake operation is executed in a transaction branch and safely merged into your source branch.

You don’t need to install any extra dependencies to read/write Delta tables to LakeFS with engines that use delta-rs. You do need to configure your LakeFS access credentials correctly.

Passing LakeFS Credentials

You can pass your LakeFS credentials explicitly by using:

  • the storage_optionskwarg
  • Environment variables

Example

Let's work through an example with Polars. The same logic applies to other Python engines like Pandas, Daft, Dask, etc.

Follow the steps below to use Delta Lake on LakeFS with Polars:

  1. Install Polars and deltalake. For example, using:

pip install polars deltalake

  1. Create a dataframe with some toy data.

df = pl.DataFrame({'x': [1, 2, 3]})

  1. Set your storage_options correctly.
storage_options = {
        "endpoint": "https://mylakefs.intranet.com", # LakeFS endpoint
        "access_key_id": "LAKEFSID",
        "secret_access_key": "LAKEFSKEY",
    }
  1. Write data to Delta table using the storage_options kwarg. The subpath after the bucket is always the branch you want to write into.
df.write_delta(
    "lakefs://bucket/branch/table",
    storage_options=storage_options,
)

Cleaning up failed transaction branches

It might occur that a deltalake operation fails midway. At this point a lakefs transaction branch was created, but never destroyed. The branches are hidden in the UI, but each branch starts with delta-tx.

With the lakefs python library you can list these branches and delete stale ones.