API Reference

DeltaTable

class deltalake.table.DeltaTable(table_uri, version=None)

Create a DeltaTable instance.

Parameters
  • table_uri (str) –

  • version (Optional[int]) –

file_paths()

Get the list of files with an absolute path.

Returns

list of the .parquet files with an absolute URI referenced for the current version of the DeltaTable

Return type

List[str]

file_uris()

Get the list of files with an absolute path.

Returns

list of the .parquet files with an absolute URI referenced for the current version of the DeltaTable

Return type

List[str]

files()

Get the .parquet files of the DeltaTable.

Returns

list of the .parquet files referenced for the current version of the DeltaTable

Return type

List[str]

files_by_partitions(partition_filters)

Get the files that match a given list of partitions filters. Partitions which do not match the filter predicate will be removed from scanned data. Predicates are expressed in disjunctive normal form (DNF), like [(“x”, “=”, “a”), …]. DNF allows arbitrary boolean logical combinations of single partition predicates. The innermost tuples each describe a single partition predicate. The list of inner predicates is interpreted as a conjunction (AND), forming a more selective and multiple partition predicates. Each tuple has format: (key, op, value) and compares the key with the value. The supported op are: =, !=, in, and not in. If the op is in or not in, the value must be a collection such as a list, a set or a tuple. The supported type for value is str. Use empty string ‘’ for Null partition value.

Examples: (“x”, “=”, “a”) (“x”, “!=”, “a”) (“y”, “in”, [“a”, “b”, “c”]) (“z”, “not in”, [“a”,”b”])

Parameters

partition_filters (List[Tuple[str, str, Any]]) – the partition filters that will be used for getting the matched files

Returns

list of the .parquet files after applying the partition filters referenced for the current version of the DeltaTable.

Return type

List[str]

classmethod from_data_catalog(data_catalog, database_name, table_name, data_catalog_id=None, version=None)

Create the Delta Table from a Data Catalog.

Parameters
  • data_catalog (deltalake.data_catalog.DataCatalog) – the Catalog to use for getting the storage location of the Delta Table

  • database_name (str) – the database name inside the Data Catalog

  • table_name (str) – the table name inside the Data Catalog

  • data_catalog_id (Optional[str]) – the identifier of the Data Catalog

  • version (Optional[int]) – version of the DeltaTable

Return type

deltalake.table.DeltaTable

history(limit=None)

Run the history command on the DeltaTable. The operations are returned in reverse chronological order.

Parameters

limit (Optional[int]) – the commit info limit to return

Returns

list of the commit infos registered in the transaction log

Return type

List[Dict[str, Any]]

load_version(version)

Load a DeltaTable with a specified version.

Parameters

version (int) – the identifier of the version of the DeltaTable to load

Return type

None

load_with_datetime(datetime_string)

Time travel Delta table to the latest version that’s created at or before provided datetime_string argument. The datetime_string argument should be an RFC 3339 and ISO 8601 date and time string.

Examples: 2018-01-26T18:30:09Z 2018-12-19T16:39:57-08:00 2018-01-26T18:30:09.453+00:00

Parameters

datetime_string (str) – the identifier of the datetime point of the DeltaTable to load

Return type

None

metadata()

Get the current metadata of the DeltaTable.

Returns

the current Metadata registered in the transaction log

Return type

deltalake.table.Metadata

pyarrow_schema()

Get the current schema of the DeltaTable with the Parquet PyArrow format.

Returns

the current Schema with the Parquet PyArrow format

Return type

pyarrow.lib.Schema

schema()

Get the current schema of the DeltaTable.

Returns

the current Schema registered in the transaction log

Return type

deltalake.schema.Schema

to_pandas(partitions=None, columns=None, filesystem=None)

Build a pandas dataframe using data from the DeltaTable.

Parameters
  • partitions (Optional[List[Tuple[str, str, Any]]]) – A list of partition filters, see help(DeltaTable.files_by_partitions) for filter syntax

  • columns (Optional[List[str]]) – The columns to project. This can be a list of column names to include (order and duplicates will be preserved)

  • filesystem (Optional[Union[str, pyarrow._fs.FileSystem]]) – A concrete implementation of the Pyarrow FileSystem or a fsspec-compatible interface. If None, the first file path will be used to determine the right FileSystem

Returns

a pandas dataframe

Return type

pandas.DataFrame

to_pyarrow_dataset(partitions=None, filesystem=None)

Build a PyArrow Dataset using data from the DeltaTable.

Parameters
  • partitions (Optional[List[Tuple[str, str, Any]]]) – A list of partition filters, see help(DeltaTable.files_by_partitions) for filter syntax

  • filesystem (Optional[Union[str, pyarrow._fs.FileSystem]]) – A concrete implementation of the Pyarrow FileSystem or a fsspec-compatible interface. If None, the first file path will be used to determine the right FileSystem

Returns

the PyArrow dataset in PyArrow

Return type

pyarrow._dataset.Dataset

to_pyarrow_table(partitions=None, columns=None, filesystem=None)

Build a PyArrow Table using data from the DeltaTable.

Parameters
  • partitions (Optional[List[Tuple[str, str, Any]]]) – A list of partition filters, see help(DeltaTable.files_by_partitions) for filter syntax

  • columns (Optional[List[str]]) – The columns to project. This can be a list of column names to include (order and duplicates will be preserved)

  • filesystem (Optional[Union[str, pyarrow._fs.FileSystem]]) – A concrete implementation of the Pyarrow FileSystem or a fsspec-compatible interface. If None, the first file path will be used to determine the right FileSystem

Returns

the PyArrow table

Return type

pyarrow.lib.Table

update_incremental()

Updates the DeltaTable to the latest version by incrementally applying newer versions.

Return type

None

vacuum(retention_hours=None, dry_run=True)

Run the Vacuum command on the Delta Table: list and delete files no longer referenced by the Delta table and are older than the retention threshold.

Parameters
  • retention_hours (Optional[int]) – the retention threshold in hours, if none then the value from configuration.deletedFileRetentionDuration is used or default of 1 week otherwise.

  • dry_run (bool) – when activated, list only the files, delete otherwise

Returns

the list of files no longer referenced by the Delta Table and are older than the retention threshold.

Return type

List[str]

version()

Get the version of the DeltaTable.

Returns

The current version of the DeltaTable

Return type

int

class deltalake.table.Metadata(table)

Create a Metadata instance.

Parameters

table (RawDeltaTable) –

property configuration: List[str]

Return the DeltaTable properties.

property created_time: int

Return The time when this metadata action is created, in milliseconds since the Unix epoch of the DeltaTable.

property description: str

Return the user-provided description of the DeltaTable.

property id: int

Return the unique identifier of the DeltaTable.

property name: str

Return the user-provided identifier of the DeltaTable.

property partition_columns: List[str]

Return an array containing the names of the partitioned columns of the DeltaTable.

DeltaSchema

class deltalake.schema.ArrayType(element_type, contains_null)

Concrete class for array data types.

Parameters
class deltalake.schema.DataType(type)

Base class of all Delta data types.

Parameters

type (str) –

Return type

None

classmethod from_dict(json_dict)

Generate a DataType from a DataType in json format.

Parameters

json_dict (Dict[str, Any]) – the data type in json format

Returns

the Delta DataType

Return type

deltalake.schema.DataType

class deltalake.schema.Field(name, type, nullable, metadata=None)

Create a DeltaTable Field instance.

Parameters
Return type

None

class deltalake.schema.MapType(key_type, value_type, value_contains_null)

Concrete class for map data types.

Parameters
class deltalake.schema.Schema(fields, json_value)

Create a DeltaTable Schema instance.

Parameters
Return type

None

classmethod from_json(json_data)

Generate a DeltaTable Schema from a json format.

Parameters

json_data (str) – the schema in json format

Returns

the DeltaTable schema

Return type

deltalake.schema.Schema

class deltalake.schema.StructType(fields)

Concrete class for struct data types.

Parameters

fields (List[deltalake.schema.Field]) –

deltalake.schema.pyarrow_datatype_from_dict(json_dict)

Create a DataType in PyArrow format from a Schema json format.

Parameters

json_dict (Dict[str, Any]) – the DataType in json format

Returns

the DataType in PyArrow format

Return type

pyarrow.lib.DataType

deltalake.schema.pyarrow_field_from_dict(field)

Create a Field in PyArrow format from a Field in json format. :param field: the field in json format :return: the Field in PyArrow format

Parameters

field (Dict[str, Any]) –

Return type

pyarrow.lib.Field

deltalake.schema.pyarrow_schema_from_json(json_data)

Create a Schema in PyArrow format from a Schema in json format.

Parameters

json_data (str) – the field in json format

Returns

the Schema in PyArrow format

Return type

pyarrow.lib.Schema

DataCatalog

class deltalake.data_catalog.DataCatalog(value)

List of the Data Catalogs

DeltaStorageHandler

class deltalake.fs.DeltaStorageHandler(table_uri)

DeltaStorageHander is a concrete implementations of a PyArrow FileSystemHandler.

Parameters

table_uri (str) –

Return type

None

copy_file(src, dest)

Copy a file.

If the destination exists and is a directory, an error is returned. Otherwise, it is replaced.

Parameters
  • src (str) – The path of the file to be copied from.

  • dest (str) – The destination path where the file is copied to.

Return type

None

create_dir(path, *, recursive=True)

Create a directory and subdirectories.

This function succeeds if the directory already exists.

Parameters
  • path (str) – The path of the new directory.

  • recursive (bool) – Create nested directories as well.

Return type

None

delete_dir(path)

Delete a directory and its contents, recursively.

Parameters

path (str) – The path of the directory to be deleted.

Return type

None

delete_dir_contents(path)

Delete a directory’s contents, recursively.

Like delete_dir, but doesn’t delete the directory itself.

Parameters

path (str) – The path of the directory to be deleted.

Return type

None

delete_file(path)

Delete a file.

Parameters

path (str) – The path of the file to be deleted.

Return type

None

delete_root_dir_contents()

Delete a directory’s contents, recursively.

Like delete_dir_contents, but for the root directory (path is empty or “/”)

Return type

None

get_file_info(paths)

Get info for the given files.

Parameters

paths (List[str]) – List of file paths

Returns

list of file info objects

Return type

List[pyarrow._fs.FileInfo]

get_file_info_selector(selector)

Get info for the files defined by FileSelector.

Parameters

selector (pyarrow._fs.FileSelector) – FileSelector object

Returns

list of file info objects

Return type

List[pyarrow._fs.FileInfo]

get_type_name()

The filesystem’s type name.

Returns

The filesystem’s type name.

Return type

str

move(src, dest)

Move / rename a file or directory.

If the destination exists: - if it is a non-empty directory, an error is returned - otherwise, if it has the same type as the source, it is replaced - otherwise, behavior is unspecified (implementation-dependent).

Parameters
  • src (str) – The path of the file or the directory to be moved.

  • dest (str) – The destination path where the file or directory is moved to.

Return type

None

normalize_path(path)

Normalize filesystem path.

Parameters

path (str) – the path to normalize

Returns

the normalized path

Return type

str

open_append_stream(path, metadata=None)

DEPRECATED: Open an output stream for appending.

If the target doesn’t exist, a new empty file is created.

Parameters
  • path (str) – The source to open for writing.

  • metadata (Optional[Dict[str, Any]]) – If not None, a mapping of string keys to string values.

Returns

NativeFile

Return type

pyarrow.lib.NativeFile

open_input_file(path)

Open an input file for random access reading.

Parameters
  • source – The source to open for reading.

  • path (str) –

Returns

NativeFile

Return type

pyarrow.lib.NativeFile

open_input_stream(path)

Open an input stream for sequential reading.

Parameters
  • source – The source to open for reading.

  • path (str) –

Returns

NativeFile

Return type

pyarrow.lib.NativeFile

open_output_stream(path, metadata=None)

Open an output stream for sequential writing.

If the target already exists, existing data is truncated.

Parameters
  • path (str) – The source to open for writing.

  • metadata (Optional[Dict[str, Any]]) – If not None, a mapping of string keys to string values.

Returns

NativeFile

Return type

pyarrow.lib.NativeFile