Skip to content

Schema

Schema and field

Schemas, fields, and data types are provided in the deltalake.schema submodule.

deltalake.Schema

Schema(fields: list[Field])

Bases: deltalake._internal.StructType

A Delta Lake schema

Create using a list of :class:Field:

Schema([Field("x", "integer"), Field("y", "string")]) Schema([Field(x, PrimitiveType("integer"), nullable=True), Field(y, PrimitiveType("string"), nullable=True)])

Or create from a PyArrow schema:

from arro3.core import DateType, Schema as ArrowSchema Schema.from_pyarrow(ArrowSchema({"x": DateType.int32(), "y": DateType.string()})) Schema([Field(x, PrimitiveType("integer"), nullable=True), Field(y, PrimitiveType("string"), nullable=True)])

invariants

invariants: list[tuple[str, str]] = <attribute 'invariants' of 'deltalake._internal.Schema' objects>

from_arrow staticmethod

from_arrow(data_type) -> Schema

Create a Schema from a schema that implements Arrow C Data Interface.

Will raise TypeError if one of the Arrow type is not a primitive type.

Parameters:

Name Type Description Default
type ArrowSchemaExportable

an object that is ArrowSchemaExportable

required

Returns:

Type Description
Schema

a Schema

from_json staticmethod

from_json(schema_json) -> Schema

Create a new Schema from a JSON string.

Parameters:

Name Type Description Default
json str

a JSON string

required
Example

A schema has the same JSON format as a StructType.

Schema.from_json('''{
    "type": "struct",
    "fields": [{"name": "x", "type": "integer", "nullable": true, "metadata": {}}]
    }
)'''
# Returns Schema([Field(x, PrimitiveType("integer"), nullable=True)])

to_arrow method descriptor

to_arrow(as_large_types: bool = False) -> ArrowSchema

Return equivalent arro3 schema

Parameters:

Name Type Description Default
as_large_types bool

get schema with all variable size types (list, binary, string) as large variants (with int64 indices). This is for compatibility with systems like Polars that only support the large versions of Arrow types.

False

Returns:

Type Description
Schema

an arro3 Schema

to_json method descriptor

to_json() -> str

Get the JSON string representation of the Schema.

Returns:

Type Description
str

a JSON string

Example

A schema has the same JSON format as a StructType.

Schema([Field("x", "integer")]).to_json()
# Returns '{"type":"struct","fields":[{"name":"x","type":"integer","nullable":true,"metadata":{}}]}'

deltalake.Field

Field(name: str, type: DataType, *, nullable: bool = True, metadata: dict[str, Any] | None = None)

metadata

metadata: dict[str, Any] = <attribute 'metadata' of 'deltalake._internal.Field' objects>

name

name: str = <attribute 'name' of 'deltalake._internal.Field' objects>

nullable

nullable: bool = <attribute 'nullable' of 'deltalake._internal.Field' objects>

type

type: DataType = <attribute 'type' of 'deltalake._internal.Field' objects>

from_arrow staticmethod

from_arrow(field: ArrowSchemaExportable) -> Field

Create a Field from an object with an ArrowSchemaExportable field

Note: This currently doesn't preserve field metadata.

Parameters:

Name Type Description Default
field ArrowSchemaExportable

a Field object that is ArrowSchemaExportable

required

Returns:

Type Description
Field

a Field

from_json staticmethod

from_json(field_json) -> Field

Create a Field from a JSON string.

Parameters:

Name Type Description Default
json str

the JSON string.

required

Returns:

Type Description
Field

Field

Example
Field.from_json('''{
        "name": "col",
        "type": "integer",
        "nullable": true,
        "metadata": {}
    }'''
)
# Returns Field(col, PrimitiveType("integer"), nullable=True)

to_arrow method descriptor

to_arrow() -> ArrowField

Convert to an equivalent arro3 field Note: This currently doesn't preserve field metadata.

Returns:

Type Description
Field

a arro3 Field

to_json method descriptor

to_json() -> str

Get the field as JSON string.

Returns:

Type Description
str

a JSON string

Example
Field("col", "integer").to_json()
# Returns '{"name":"col","type":"integer","nullable":true,"metadata":{}}'

Data types

deltalake.schema.PrimitiveType

PrimitiveType(data_type: str)

type

type: str = <attribute 'type' of 'deltalake._internal.PrimitiveType' objects>

from_arrow staticmethod

from_arrow(data_type) -> PrimitiveType

Create a PrimitiveType from an ArrowSchemaExportable datatype

Will raise TypeError if the arrow type is not a primitive type.

Parameters:

Name Type Description Default
type ArrowSchemaExportable

an object that is ArrowSchemaExportable

required

Returns:

Type Description
PrimitiveType

a PrimitiveType

from_json staticmethod

from_json(type_json) -> PrimitiveType

Create a PrimitiveType from a JSON string

The JSON representation for a primitive type is just a quoted string: PrimitiveType.from_json('"integer"')

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
PrimitiveType

a PrimitiveType type

to_arrow method descriptor

to_arrow() -> ArrowDataType

Get the equivalent arro3 DataType (arro3.core.DataType)

deltalake.schema.ArrayType

ArrayType(element_type: DataType, *, contains_null: bool = True)

contains_null

contains_null: bool = <attribute 'contains_null' of 'deltalake._internal.ArrayType' objects>

element_type

element_type: DataType = <attribute 'element_type' of 'deltalake._internal.ArrayType' objects>

type

type: Literal['array'] = <attribute 'type' of 'deltalake._internal.ArrayType' objects>

from_arrow staticmethod

from_arrow(data_type) -> ArrayType

Create an ArrayType from an ArrowSchemaExportable datatype.

Will raise TypeError if a different arrow DataType is provided.

Parameters:

Name Type Description Default
type ArrowSchemaExportable

an object that is ArrowSchemaExportable

required

Returns:

Type Description
ArrayType

an ArrayType

from_json staticmethod

from_json(type_json) -> ArrayType

Create an ArrayType from a JSON string

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
ArrayType

an ArrayType

Example

The JSON representation for an array type is an object with type (set to "array"), elementType, and containsNull.

ArrayType.from_json(
    '''{
        "type": "array",
        "elementType": "integer",
        "containsNull": false
    }'''
)
# Returns ArrayType(PrimitiveType("integer"), contains_null=False)

to_arrow method descriptor

to_arrow() -> ArrowDataType

Get the equivalent arro3 type.

to_json method descriptor

to_json() -> str

Get the JSON string representation of the type.

deltalake.schema.MapType

MapType(key_type: DataType, value_type: DataType, *, value_contains_null: bool = True)

key_type

key_type: DataType = <attribute 'key_type' of 'deltalake._internal.MapType' objects>

type

type: Literal['map'] = <attribute 'type' of 'deltalake._internal.MapType' objects>

value_contains_null

value_contains_null: bool = <attribute 'value_contains_null' of 'deltalake._internal.MapType' objects>

value_type

value_type: DataType = <attribute 'value_type' of 'deltalake._internal.MapType' objects>

from_arrow staticmethod

from_arrow(data_type) -> MapType

Create a MapType from an ArrowSchemaExportable datatype

Will raise TypeError if passed a different type.

Parameters:

Name Type Description Default
type ArrowSchemaExportable

an object that is ArrowSchemaExportable

required

Returns:

Type Description
MapType

a MapType

from_json staticmethod

from_json(type_json) -> MapType

Create a MapType from a JSON string

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
MapType

an ArrayType

Example

The JSON representation for a map type is an object with type (set to map), keyType, valueType, and valueContainsNull:

MapType.from_json(
    '''{
        "type": "map",
        "keyType": "integer",
        "valueType": "string",
        "valueContainsNull": true
    }'''
)
# Returns MapType(PrimitiveType("integer"), PrimitiveType("string"), value_contains_null=True)

to_arrow method descriptor

to_arrow() -> ArrowDataType

Get the equivalent arro3 data type.

to_json method descriptor

to_json() -> str

Get JSON string representation of map type.

Returns:

Type Description
str

a JSON string

deltalake.schema.StructType

StructType(fields: list[Field])

fields

fields: list[Field] = <attribute 'fields' of 'deltalake._internal.StructType' objects>

type

type: Literal['struct'] = <attribute 'type' of 'deltalake._internal.StructType' objects>

The string "struct"

from_arrow staticmethod

from_arrow(data_type) -> StructType

Create a new StructType from an ArrowSchemaExportable datatype

Will raise TypeError if a different data type is provided.

Parameters:

Name Type Description Default
type ArrowSchemaExportable

a struct type object that is ArrowSchemaExportable

required

Returns:

Type Description
StructType

a StructType

from_json staticmethod

from_json(type_json) -> StructType

Create a new StructType from a JSON string.

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
StructType

a StructType

Example
StructType.from_json(
    '''{
        "type": "struct",
        "fields": [{"name": "x", "type": "integer", "nullable": true, "metadata": {}}]
    }'''
)
# Returns StructType([Field(x, PrimitiveType("integer"), nullable=True)])

to_arrow method descriptor

to_arrow() -> ArrowDataType

Get the equivalent arro3 DataType (arro3.core.DataType)

to_json method descriptor

to_json() -> str

Get the JSON representation of the type.

Returns:

Type Description
str

a JSON string

Example
StructType([Field("x", "integer")]).to_json()
# Returns '{"type":"struct","fields":[{"name":"x","type":"integer","nullable":true,"metadata":{}}]}'