Skip to content

Schema

Schema and field

Schemas, fields, and data types are provided in the deltalake.schema submodule.

deltalake.Schema

Schema(fields: List[Field])

Bases: deltalake._internal.StructType

A Delta Lake schema

Create using a list of :class:Field:

Schema([Field("x", "integer"), Field("y", "string")]) Schema([Field(x, PrimitiveType("integer"), nullable=True), Field(y, PrimitiveType("string"), nullable=True)])

Or create from a PyArrow schema:

import pyarrow as pa Schema.from_pyarrow(pa.schema({"x": pa.int32(), "y": pa.string()})) Schema([Field(x, PrimitiveType("integer"), nullable=True), Field(y, PrimitiveType("string"), nullable=True)])

invariants

invariants: List[Tuple[str, str]] = <attribute 'invariants' of 'deltalake._internal.Schema' objects>

from_json staticmethod

from_json(schema_json) -> Schema

Create a new Schema from a JSON string.

Parameters:

Name Type Description Default
json str

a JSON string

required
Example

A schema has the same JSON format as a StructType.

Schema.from_json('''{
    "type": "struct",
    "fields": [{"name": "x", "type": "integer", "nullable": true, "metadata": {}}]
    }
)'''
# Returns Schema([Field(x, PrimitiveType("integer"), nullable=True)])

from_pyarrow staticmethod

from_pyarrow(data_type) -> Schema

Create a Schema from a PyArrow Schema type

Will raise TypeError if the PyArrow type is not a primitive type.

Parameters:

Name Type Description Default
type Schema

A PyArrow Schema

required

Returns:

Type Description
Schema

a Schema

to_json method descriptor

to_json() -> str

Get the JSON string representation of the Schema.

Returns:

Type Description
str

a JSON string

Example

A schema has the same JSON format as a StructType.

Schema([Field("x", "integer")]).to_json()
# Returns '{"type":"struct","fields":[{"name":"x","type":"integer","nullable":true,"metadata":{}}]}'

to_pyarrow method descriptor

to_pyarrow(as_large_types: bool = False) -> pyarrow.Schema

Return equivalent PyArrow schema

Parameters:

Name Type Description Default
as_large_types bool

get schema with all variable size types (list, binary, string) as large variants (with int64 indices). This is for compatibility with systems like Polars that only support the large versions of Arrow types.

False

Returns:

Type Description
Schema

a PyArrow Schema

deltalake.Field

Field(name: str, type: DataType, *, nullable: bool = True, metadata: Optional[Dict[str, Any]] = None)

metadata

metadata: Dict[str, Any] = <attribute 'metadata' of 'deltalake._internal.Field' objects>

name

name: str = <attribute 'name' of 'deltalake._internal.Field' objects>

nullable

nullable: bool = <attribute 'nullable' of 'deltalake._internal.Field' objects>

type

type: DataType = <attribute 'type' of 'deltalake._internal.Field' objects>

from_json staticmethod

from_json(field_json) -> Field

Create a Field from a JSON string.

Parameters:

Name Type Description Default
json str

the JSON string.

required

Returns:

Type Description
Field

Field

Example
Field.from_json('''{
        "name": "col",
        "type": "integer",
        "nullable": true,
        "metadata": {}
    }'''
)
# Returns Field(col, PrimitiveType("integer"), nullable=True)

from_pyarrow staticmethod

from_pyarrow(field: pyarrow.Field) -> Field

Create a Field from a PyArrow field Note: This currently doesn't preserve field metadata.

Parameters:

Name Type Description Default
field Field

a PyArrow Field

required

Returns:

Type Description
Field

a Field

to_json method descriptor

to_json() -> str

Get the field as JSON string.

Returns:

Type Description
str

a JSON string

Example
Field("col", "integer").to_json()
# Returns '{"name":"col","type":"integer","nullable":true,"metadata":{}}'

to_pyarrow method descriptor

to_pyarrow() -> pyarrow.Field

Convert to an equivalent PyArrow field Note: This currently doesn't preserve field metadata.

Returns:

Type Description
Field

a pyarrow Field

Data types

deltalake.schema.PrimitiveType

PrimitiveType(data_type: str)

type

type: str = <attribute 'type' of 'deltalake._internal.PrimitiveType' objects>

from_json staticmethod

from_json(type_json) -> PrimitiveType

Create a PrimitiveType from a JSON string

The JSON representation for a primitive type is just a quoted string: PrimitiveType.from_json('"integer"')

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
PrimitiveType

a PrimitiveType type

from_pyarrow staticmethod

from_pyarrow(data_type) -> PrimitiveType

Create a PrimitiveType from a PyArrow datatype

Will raise TypeError if the PyArrow type is not a primitive type.

Parameters:

Name Type Description Default
type DataType

A PyArrow DataType

required

Returns:

Type Description
PrimitiveType

a PrimitiveType

to_pyarrow method descriptor

to_pyarrow() -> pyarrow.DataType

Get the equivalent PyArrow type (pyarrow.DataType)

deltalake.schema.ArrayType

ArrayType(element_type: DataType, *, contains_null: bool = True)

contains_null

contains_null: bool = <attribute 'contains_null' of 'deltalake._internal.ArrayType' objects>

element_type

element_type: DataType = <attribute 'element_type' of 'deltalake._internal.ArrayType' objects>

type

type: Literal['array'] = <attribute 'type' of 'deltalake._internal.ArrayType' objects>

from_json staticmethod

from_json(type_json) -> ArrayType

Create an ArrayType from a JSON string

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
ArrayType

an ArrayType

Example

The JSON representation for an array type is an object with type (set to "array"), elementType, and containsNull.

ArrayType.from_json(
    '''{
        "type": "array",
        "elementType": "integer",
        "containsNull": false
    }'''
)
# Returns ArrayType(PrimitiveType("integer"), contains_null=False)

from_pyarrow staticmethod

from_pyarrow(data_type) -> ArrayType

Create an ArrayType from a pyarrow.ListType.

Will raise TypeError if a different PyArrow DataType is provided.

Parameters:

Name Type Description Default
type ListType

The PyArrow ListType

required

Returns:

Type Description
ArrayType

an ArrayType

to_json method descriptor

to_json() -> str

Get the JSON string representation of the type.

to_pyarrow method descriptor

to_pyarrow() -> pyarrow.ListType

Get the equivalent PyArrow type.

deltalake.schema.MapType

MapType(key_type: DataType, value_type: DataType, *, value_contains_null: bool = True)

key_type

key_type: DataType = <attribute 'key_type' of 'deltalake._internal.MapType' objects>

type

type: Literal['map'] = <attribute 'type' of 'deltalake._internal.MapType' objects>

value_contains_null

value_contains_null: bool = <attribute 'value_contains_null' of 'deltalake._internal.MapType' objects>

value_type

value_type: DataType = <attribute 'value_type' of 'deltalake._internal.MapType' objects>

from_json staticmethod

from_json(type_json) -> MapType

Create a MapType from a JSON string

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
MapType

an ArrayType

Example

The JSON representation for a map type is an object with type (set to map), keyType, valueType, and valueContainsNull:

MapType.from_json(
    '''{
        "type": "map",
        "keyType": "integer",
        "valueType": "string",
        "valueContainsNull": true
    }'''
)
# Returns MapType(PrimitiveType("integer"), PrimitiveType("string"), value_contains_null=True)

from_pyarrow staticmethod

from_pyarrow(data_type) -> MapType

Create a MapType from a PyArrow MapType.

Will raise TypeError if passed a different type.

Parameters:

Name Type Description Default
type MapType

the PyArrow MapType

required

Returns:

Type Description
MapType

a MapType

to_json method descriptor

to_json() -> str

Get JSON string representation of map type.

Returns:

Type Description
str

a JSON string

to_pyarrow method descriptor

to_pyarrow() -> pyarrow.MapType

Get the equivalent PyArrow data type.

deltalake.schema.StructType

StructType(fields: List[Field])

fields

fields: List[Field] = <attribute 'fields' of 'deltalake._internal.StructType' objects>

type

type: Literal['struct'] = <attribute 'type' of 'deltalake._internal.StructType' objects>

The string "struct"

from_json staticmethod

from_json(type_json) -> StructType

Create a new StructType from a JSON string.

Parameters:

Name Type Description Default
json str

a JSON string

required

Returns:

Type Description
StructType

a StructType

Example
StructType.from_json(
    '''{
        "type": "struct",
        "fields": [{"name": "x", "type": "integer", "nullable": true, "metadata": {}}]
    }'''
)
# Returns StructType([Field(x, PrimitiveType("integer"), nullable=True)])

from_pyarrow staticmethod

from_pyarrow(data_type) -> StructType

Create a new StructType from a PyArrow struct type.

Will raise TypeError if a different data type is provided.

Parameters:

Name Type Description Default
type StructType

a PyArrow struct type.

required

Returns:

Type Description
StructType

a StructType

to_json method descriptor

to_json() -> str

Get the JSON representation of the type.

Returns:

Type Description
str

a JSON string

Example
StructType([Field("x", "integer")]).to_json()
# Returns '{"type":"struct","fields":[{"name":"x","type":"integer","nullable":true,"metadata":{}}]}'

to_pyarrow method descriptor

to_pyarrow() -> pyarrow.StructType

Get the equivalent PyArrow StructType

Returns:

Type Description
StructType

a PyArrow StructType