@Evolving public interface JsonHandler
ColumnarBatch
or read content from JSON files.
Connectors can leverage this interface to provide their best implementation of the JSON parsing
capability to Delta Kernel.Modifier and Type | Method and Description |
---|---|
StructType |
deserializeStructType(String structTypeJson)
Deserialize the Delta schema from
structTypeJson according to the Delta Protocol
schema serialization rules . |
ColumnarBatch |
parseJson(ColumnVector jsonStringVector,
StructType outputSchema,
java.util.Optional<ColumnVector> selectionVector)
Parse the given json strings and return the fields requested by
outputSchema
as columns in a ColumnarBatch . |
CloseableIterator<ColumnarBatch> |
readJsonFiles(CloseableIterator<FileStatus> fileIter,
StructType physicalSchema,
java.util.Optional<Predicate> predicate)
Read and parse the JSON format file at given locations and return the data as a
ColumnarBatch with the columns requested by physicalSchema . |
void |
writeJsonFileAtomically(String filePath,
CloseableIterator<Row> data,
boolean overwrite)
Serialize each
Row in the iterator as JSON and write as a separate line in
destination file. |
ColumnarBatch parseJson(ColumnVector jsonStringVector, StructType outputSchema, java.util.Optional<ColumnVector> selectionVector)
outputSchema
as columns in a ColumnarBatch
.
There are a couple special cases that should be handled for specific data types:
"NaN"
"+INF", "Infinity", "+Infinity"
"-INF", "-Infinity""
"yyyy-MM-dd"
"yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
jsonStringVector
- String ColumnVector
of valid JSON strings.outputSchema
- Schema of the data to return from the parsed JSON. If any requested
fields are missing in the JSON string, a null is returned for
that particular field in the returned Row
. The type for each
given field is expected to match the type in the JSON.selectionVector
- Optional selection vector indicating which rows to parse the JSON.
If present, only the selected rows should be parsed. Unselected rows
should be all null in the returned batch.ColumnarBatch
of schema outputSchema
with one row for each entry
in jsonStringVector
StructType deserializeStructType(String structTypeJson)
structTypeJson
according to the Delta Protocol
schema serialization rules .structTypeJson
- the JSON formatted schema string to parseStructType
CloseableIterator<ColumnarBatch> readJsonFiles(CloseableIterator<FileStatus> fileIter, StructType physicalSchema, java.util.Optional<Predicate> predicate) throws java.io.IOException
ColumnarBatch
with the columns requested by physicalSchema
.fileIter
- Iterator of files to read data from.physicalSchema
- Select list of columns to read from the JSON file.predicate
- Optional predicate which the JSON reader can optionally use to prune
rows that don't satisfy the predicate. Because pruning is optional and
may be incomplete, caller is still responsible apply the predicate on
the data returned by this method.ColumnarBatch
s containing the data in columnar format.
It is the responsibility of the caller to close the iterator. The data returned is in
the same as the order of files given in scanFileIter
java.io.IOException
- if an I/O error occurs during the read.void writeJsonFileAtomically(String filePath, CloseableIterator<Row> data, boolean overwrite) throws java.io.IOException
Row
in the iterator as JSON and write as a separate line in
destination file. This call either succeeds in creating the file with given contents or no
file is created at all. It won't leave behind a partially written file.
Following are the supported data types and their serialization rules. At a high-level, the
JSON serialization is similar to that of jackson
JSON serializer.
struct
: any element whose value is null is not written to filemap
: only a map
with string
key type is supported. If an
entry value is null
, it should be written to the file.array
: null
value elements are written to filefilePath
- Fully qualified destination file pathdata
- Iterator of Row
objects where each row should be serialized as JSON
and written as separate line in the destination file.overwrite
- If true
, the file is overwritten if it already exists. If
false
and a file exists FileAlreadyExistsException
is
thrown.java.nio.file.FileAlreadyExistsException
- if the file already exists and overwrite
is false.java.io.IOException
- if any other I/O error occurs.