Skip to content

Schema

Schema and field

Schemas, fields, and data types are provided in the deltalake.schema submodule.

deltalake.Schema

Schema(fields: list[Field])

invariants instance-attribute

invariants: list[tuple[str, str]]

The list of invariants on the table. Each invarint is a tuple of strings. The first string is the field path and the second is the SQL of the invariant.

from_arrow staticmethod

from_arrow(type: ArrowSchemaExportable) -> Schema

Create a Schema from a schema that implements Arrow C Data Interface.

Will raise TypeError if one of the Arrow type is not a primitive type.

Parameters:

Name Type Description Default
type ArrowSchemaExportable

an object that is ArrowSchemaExportable

required

Returns:

Type Description
Schema

a Schema

from_json staticmethod

from_json(json: str) -> Schema

Create a new Schema from a JSON string.

Parameters:

Name Type Description Default
json str

a JSON string

required
Example

A schema has the same JSON format as a StructType.

Schema.from_json('''{
    "type": "struct",
    "fields": [{"name": "x", "type": "integer", "nullable": true, "metadata": {}}]
    }
)'''
# Returns Schema([Field(x, PrimitiveType("integer"), nullable=True)])

to_arrow

to_arrow(as_large_types: bool = False) -> ArrowSchema

Return equivalent arro3 schema

Parameters:

Name Type Description Default
as_large_types bool

get schema with all variable size types (list, binary, string) as large variants (with int64 indices). This is for compatibility with systems like Polars that only support the large versions of Arrow types.

False

Returns:

Type Description
Schema

an arro3 Schema

to_json

to_json() -> str

Get the JSON string representation of the Schema.

Returns:

Type Description
str

a JSON string

Example

A schema has the same JSON format as a StructType.

Schema([Field("x", "integer")]).to_json()
# Returns '{"type":"struct","fields":[{"name":"x","type":"integer","nullable":true,"metadata":{}}]}'

deltalake.Field

Field(name: str, type: DataType, *, nullable: bool = True, metadata: dict[str, Any] | None = None)

A field in a Delta StructType or Schema

Example

Can create with just a name and a type:

Field("my_int_col", "integer")
# Returns Field("my_int_col", PrimitiveType("integer"), nullable=True, metadata=None)

Can also attach metadata to the field. Metadata should be a dictionary with string keys and JSON-serializable values (str, list, int, float, dict):

Field("my_col", "integer", metadata={"custom_metadata": {"test": 2}})
# Returns Field("my_col", PrimitiveType("integer"), nullable=True, metadata={"custom_metadata": {"test": 2}})

metadata instance-attribute

metadata: dict[str, Any]

The metadata of the field

name instance-attribute

name: str

The name of the field

nullable instance-attribute

nullable: bool

Whether there may be null values in the field

type instance-attribute

type: DataType

The type of the field, of type: Union[ PrimitiveType, ArrayType, MapType, StructType ]

from_arrow staticmethod

from_arrow(field: ArrowSchemaExportable) -> Field

Create a Field from an object with an ArrowSchemaExportable field

Note: This currently doesn't preserve field metadata.

Parameters:

Name Type Description Default
field ArrowSchemaExportable

a Field object that is ArrowSchemaExportable

required

Returns:

Type Description
Field

a Field

from_json staticmethod

from_json(json: str) -> Field

Create a Field from a JSON string.

Parameters:

Name Type Description Default
json str

the JSON string.

required

Returns:

Type Description
Field

Field

Example
Field.from_json('''{
        "name": "col",
        "type": "integer",
        "nullable": true,
        "metadata": {}
    }'''
)
# Returns Field(col, PrimitiveType("integer"), nullable=True)

to_arrow

to_arrow() -> ArrowField

Convert to an equivalent arro3 field Note: This currently doesn't preserve field metadata.

Returns:

Type Description
Field

a arro3 Field

to_json

to_json() -> str

Get the field as JSON string.

Returns:

Type Description
str

a JSON string

Example
Field("col", "integer").to_json()
# Returns '{"name":"col","type":"integer","nullable":true,"metadata":{}}'

Data types

deltalake.schema.PrimitiveType

PrimitiveType(data_type: str)

A primitive datatype, such as a string or number.

Can be initialized with a string value:

PrimitiveType("integer")

Valid primitive data types include:

  • "string",
  • "long",
  • "integer",
  • "short",
  • "byte",
  • "float",
  • "double",
  • "boolean",
  • "binary",
  • "date",
  • "timestamp",
  • "timestamp_ntz",
  • "decimal(, )" Max: decimal(38,38)

Parameters:

Name Type Description Default
data_type str

string representation of the data type

required

type instance-attribute

type: str

The inner type

from_arrow staticmethod

from_arrow(type: ArrowSchemaExportable) -> PrimitiveType

Create a PrimitiveType from an ArrowSchemaExportable datatype

Will raise TypeError if the arrow type is not a primitive type.

Parameters:

Name Type Description Default
type ArrowSchemaExportable

an object that is ArrowSchemaExportable

required

Returns:

Type Description
PrimitiveType

a PrimitiveType

from_json staticmethod

from_json(json: str) -> PrimitiveType

Create a PrimitiveType from a JSON string

The JSON representation for a primitive type is just a quoted string: PrimitiveType.from_json('"integer"')

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
PrimitiveType

a PrimitiveType type

to_arrow

to_arrow() -> ArrowDataType

Get the equivalent arro3 DataType (arro3.core.DataType)

deltalake.schema.ArrayType

ArrayType(element_type: DataType, *, contains_null: bool = True)

An Array (List) DataType

Example

Can either pass the element type explicitly or can pass a string if it is a primitive type:

ArrayType(PrimitiveType("integer"))
# Returns ArrayType(PrimitiveType("integer"), contains_null=True)

ArrayType("integer", contains_null=False)
# Returns ArrayType(PrimitiveType("integer"), contains_null=False)

contains_null instance-attribute

contains_null: bool

Whether the arrays may contain null values

element_type instance-attribute

element_type: DataType

The type of the element, of type: Union[ PrimitiveType, ArrayType, MapType, StructType ]

type instance-attribute

type: Literal['array']

The string "array"

from_arrow staticmethod

from_arrow(type: ArrowSchemaExportable) -> ArrayType

Create an ArrayType from an ArrowSchemaExportable datatype.

Will raise TypeError if a different arrow DataType is provided.

Parameters:

Name Type Description Default
type ArrowSchemaExportable

an object that is ArrowSchemaExportable

required

Returns:

Type Description
ArrayType

an ArrayType

from_json staticmethod

from_json(json: str) -> ArrayType

Create an ArrayType from a JSON string

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
ArrayType

an ArrayType

Example

The JSON representation for an array type is an object with type (set to "array"), elementType, and containsNull.

ArrayType.from_json(
    '''{
        "type": "array",
        "elementType": "integer",
        "containsNull": false
    }'''
)
# Returns ArrayType(PrimitiveType("integer"), contains_null=False)

to_arrow

to_arrow() -> ArrowDataType

Get the equivalent arro3 type.

to_json

to_json() -> str

Get the JSON string representation of the type.

deltalake.schema.MapType

MapType(key_type: DataType, value_type: DataType, *, value_contains_null: bool = True)

A map data type

key_type and value_type should be PrimitiveType, ArrayType, or StructType. A string can also be passed, which will be parsed as a primitive type:

Example
MapType(PrimitiveType("integer"), PrimitiveType("string"))
# Returns MapType(PrimitiveType("integer"), PrimitiveType("string"), value_contains_null=True)

MapType("integer", "string", value_contains_null=False)
# Returns MapType(PrimitiveType("integer"), PrimitiveType("string"), value_contains_null=False)

key_type instance-attribute

key_type: DataType

The type of the keys, of type: Union[ PrimitiveType, ArrayType, MapType, StructType ]

value_contains_null instance-attribute

value_contains_null: bool

Whether the values in a map may be null

value_type instance-attribute

value_type: DataType

The type of the values, of type: Union[ PrimitiveType, ArrayType, MapType, StructType ]

from_arrow staticmethod

from_arrow(type: ArrowSchemaExportable) -> MapType

Create a MapType from an ArrowSchemaExportable datatype

Will raise TypeError if passed a different type.

Parameters:

Name Type Description Default
type ArrowSchemaExportable

an object that is ArrowSchemaExportable

required

Returns:

Type Description
MapType

a MapType

from_json staticmethod

from_json(json: str) -> MapType

Create a MapType from a JSON string

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
MapType

an ArrayType

Example

The JSON representation for a map type is an object with type (set to map), keyType, valueType, and valueContainsNull:

MapType.from_json(
    '''{
        "type": "map",
        "keyType": "integer",
        "valueType": "string",
        "valueContainsNull": true
    }'''
)
# Returns MapType(PrimitiveType("integer"), PrimitiveType("string"), value_contains_null=True)

to_arrow

to_arrow() -> ArrowDataType

Get the equivalent arro3 data type.

to_json

to_json() -> str

Get JSON string representation of map type.

Returns:

Type Description
str

a JSON string

deltalake.schema.StructType

StructType(fields: list[Field])

A struct datatype, containing one or more subfields

Example

Create with a list of :class:Field:

StructType([Field("x", "integer"), Field("y", "string")])
# Creates: StructType([Field(x, PrimitiveType("integer"), nullable=True), Field(y, PrimitiveType("string"), nullable=True)])

fields instance-attribute

fields: list[Field]

The fields within the struct

from_arrow staticmethod

from_arrow(type: ArrowSchemaExportable) -> StructType

Create a new StructType from an ArrowSchemaExportable datatype

Will raise TypeError if a different data type is provided.

Parameters:

Name Type Description Default
type ArrowSchemaExportable

a struct type object that is ArrowSchemaExportable

required

Returns:

Type Description
StructType

a StructType

from_json staticmethod

from_json(json: str) -> StructType

Create a new StructType from a JSON string.

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
StructType

a StructType

Example
StructType.from_json(
    '''{
        "type": "struct",
        "fields": [{"name": "x", "type": "integer", "nullable": true, "metadata": {}}]
    }'''
)
# Returns StructType([Field(x, PrimitiveType("integer"), nullable=True)])

to_arrow

to_arrow() -> ArrowDataType

Get the equivalent arro3 DataType (arro3.core.DataType)

to_json

to_json() -> str

Get the JSON representation of the type.

Returns:

Type Description
str

a JSON string

Example
StructType([Field("x", "integer")]).to_json()
# Returns '{"type":"struct","fields":[{"name":"x","type":"integer","nullable":true,"metadata":{}}]}'