public class SparkSchemaUtil
extends java.lang.Object
Modifier and Type | Method and Description |
---|---|
static Type |
convert(org.apache.spark.sql.types.DataType sparkType)
Convert a Spark
struct to a Type with new field ids. |
static org.apache.spark.sql.types.StructType |
convert(Schema schema)
Convert a
Schema to a Spark type . |
static Schema |
convert(Schema baseSchema,
org.apache.spark.sql.types.StructType sparkType)
Convert a Spark
struct to a Schema based on the given schema. |
static Schema |
convert(Schema baseSchema,
org.apache.spark.sql.types.StructType sparkType,
boolean caseSensitive)
Convert a Spark
struct to a Schema based on the given schema. |
static Schema |
convert(org.apache.spark.sql.types.StructType sparkType)
Convert a Spark
struct to a Schema with new field ids. |
static org.apache.spark.sql.types.DataType |
convert(Type type)
Convert a
Type to a Spark type . |
static Schema |
convertWithFreshIds(Schema baseSchema,
org.apache.spark.sql.types.StructType sparkType)
Convert a Spark
struct to a Schema based on the given schema. |
static Schema |
convertWithFreshIds(Schema baseSchema,
org.apache.spark.sql.types.StructType sparkType,
boolean caseSensitive)
Convert a Spark
struct to a Schema based on the given schema. |
static long |
estimateSize(org.apache.spark.sql.types.StructType tableSchema,
long totalRecords)
Estimate approximate table size based on Spark schema and total records.
|
static java.util.Map<java.lang.Integer,java.lang.String> |
indexQuotedNameById(Schema schema) |
static Schema |
prune(Schema schema,
org.apache.spark.sql.types.StructType requestedType)
Prune columns from a
Schema using a Spark type projection. |
static Schema |
prune(Schema schema,
org.apache.spark.sql.types.StructType requestedType,
Expression filter,
boolean caseSensitive)
Prune columns from a
Schema using a Spark type projection. |
static Schema |
prune(Schema schema,
org.apache.spark.sql.types.StructType requestedType,
java.util.List<Expression> filters)
Prune columns from a
Schema using a Spark type projection. |
static Schema |
schemaForTable(org.apache.spark.sql.SparkSession spark,
java.lang.String name)
Returns a
Schema for the given table with fresh field ids. |
static PartitionSpec |
specForTable(org.apache.spark.sql.SparkSession spark,
java.lang.String name)
Returns a
PartitionSpec for the given table. |
static void |
validateMetadataColumnReferences(Schema tableSchema,
Schema readSchema) |
public static Schema schemaForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)
Schema
for the given table with fresh field ids.
This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
spark
- a Spark sessionname
- a table name and (optional) databasepublic static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.AnalysisException
PartitionSpec
for the given table.
This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
spark
- a Spark sessionname
- a table name and (optional) databaseorg.apache.spark.sql.AnalysisException
- if thrown by the Spark catalogpublic static org.apache.spark.sql.types.StructType convert(Schema schema)
Schema
to a Spark type
.schema
- a Schemajava.lang.IllegalArgumentException
- if the type cannot be converted to Sparkpublic static org.apache.spark.sql.types.DataType convert(Type type)
Type
to a Spark type
.type
- a Typejava.lang.IllegalArgumentException
- if the type cannot be converted to Sparkpublic static Schema convert(org.apache.spark.sql.types.StructType sparkType)
struct
to a Schema
with new field ids.
This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use convert(Schema, StructType)
.
sparkType
- a Spark StructTypejava.lang.IllegalArgumentException
- if the type cannot be convertedpublic static Type convert(org.apache.spark.sql.types.DataType sparkType)
struct
to a Type
with new field ids.
This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use convert(Schema, StructType)
.
sparkType
- a Spark DataTypejava.lang.IllegalArgumentException
- if the type cannot be convertedpublic static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
struct
to a Schema
based on the given schema.
This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructTypejava.lang.IllegalArgumentException
- if the type cannot be converted or there are missing idspublic static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)
struct
to a Schema
based on the given schema.
This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructTypecaseSensitive
- when false, the case of schema fields is ignoredjava.lang.IllegalArgumentException
- if the type cannot be converted or there are missing idspublic static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
struct
to a Schema
based on the given schema.
This conversion will assign new ids for fields that are not found in the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructTypejava.lang.IllegalArgumentException
- if the type cannot be converted or there are missing idspublic static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)
struct
to a Schema
based on the given schema.
This conversion will assign new ids for fields that are not found in the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructTypecaseSensitive
- when false, case of field names in schema is ignoredjava.lang.IllegalArgumentException
- if the type cannot be converted or there are missing idspublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)
Schema
using a Spark type
projection.
This requires that the Spark type is a projection of the Schema. Nullability and types must match.
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schemajava.lang.IllegalArgumentException
- if the Spark type does not match the Schemapublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, java.util.List<Expression> filters)
Schema
using a Spark type
projection.
This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of Expression
is used to ensure that columns referenced by filters
are projected.
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schemafilters
- a list of filtersjava.lang.IllegalArgumentException
- if the Spark type does not match the Schemapublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)
Schema
using a Spark type
projection.
This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of Expression
is used to ensure that columns referenced by filters
are projected.
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schemafilter
- a filtersjava.lang.IllegalArgumentException
- if the Spark type does not match the Schemapublic static long estimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords)
tableSchema
- Spark schematotalRecords
- total records in the tablepublic static void validateMetadataColumnReferences(Schema tableSchema, Schema readSchema)
public static java.util.Map<java.lang.Integer,java.lang.String> indexQuotedNameById(Schema schema)