SparkSchemaUtil

java.lang.Object
- org.apache.iceberg.spark.SparkSchemaUtil

public class SparkSchemaUtil
extends java.lang.Object

Helper methods for working with Spark/Hive metadata.

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static Type`	`convert(org.apache.spark.sql.types.DataType sparkType)` Convert a Spark `struct` to a `Type` with new field ids.
`static org.apache.spark.sql.types.StructType`	`convert(Schema schema)` Convert a `Schema` to a `Spark type`.
`static Schema`	`convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)` Convert a Spark `struct` to a `Schema` based on the given schema.
`static Schema`	`convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)` Convert a Spark `struct` to a `Schema` based on the given schema.
`static Schema`	`convert(org.apache.spark.sql.types.StructType sparkType)` Convert a Spark `struct` to a `Schema` with new field ids.
`static org.apache.spark.sql.types.DataType`	`convert(Type type)` Convert a `Type` to a `Spark type`.
`static Schema`	`convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)` Convert a Spark `struct` to a `Schema` based on the given schema.
`static Schema`	`convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)` Convert a Spark `struct` to a `Schema` based on the given schema.
`static long`	`estimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords)` Estimate approximate table size based on Spark schema and total records.
`static java.util.Map<java.lang.Integer,java.lang.String>`	`indexQuotedNameById(Schema schema)`
`static Schema`	`prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)` Prune columns from a `Schema` using a `Spark type` projection.
`static Schema`	`prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)` Prune columns from a `Schema` using a `Spark type` projection.
`static Schema`	`prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, java.util.List<Expression> filters)` Prune columns from a `Schema` using a `Spark type` projection.
`static Schema`	`schemaForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)` Returns a `Schema` for the given table with fresh field ids.
`static PartitionSpec`	`specForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)` Returns a `PartitionSpec` for the given table.
`static void`	`validateMetadataColumnReferences(Schema tableSchema, Schema readSchema)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - schemaForTable
```
public static Schema schemaForTable(org.apache.spark.sql.SparkSession spark,
                                    java.lang.String name)
```
    Returns a Schema for the given table with fresh field ids.
    This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
    
    Parameters:
    
    spark - a Spark session
    
    name - a table name and (optional) database
    
    Returns:
    
    a Schema for the table, if found
  - specForTable
```
public static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark,
                                         java.lang.String name)
                                  throws org.apache.spark.sql.AnalysisException
```
    Returns a PartitionSpec for the given table.
    This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
    
    Parameters:
    
    spark - a Spark session
    
    name - a table name and (optional) database
    
    Returns:
    
    a PartitionSpec for the table
    
    Throws:
    
    org.apache.spark.sql.AnalysisException - if thrown by the Spark catalog
  - convert
```
public static org.apache.spark.sql.types.StructType convert(Schema schema)
```
    Convert a Schema to a Spark type.
    
    Parameters:
    
    schema - a Schema
    
    Returns:
    
    the equivalent Spark type
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted to Spark
  - convert
```
public static org.apache.spark.sql.types.DataType convert(Type type)
```
    Convert a Type to a Spark type.
    
    Parameters:
    
    type - a Type
    
    Returns:
    
    the equivalent Spark type
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted to Spark
  - convert
```
public static Schema convert(org.apache.spark.sql.types.StructType sparkType)
```
    Convert a Spark struct to a Schema with new field ids.
    This conversion assigns fresh ids.
    Some data types are represented as the same Spark type. These are converted to a default type.
    To convert using a reference schema for field ids and ambiguous types, use convert(Schema, StructType).
    
    Parameters:
    
    sparkType - a Spark StructType
    
    Returns:
    
    the equivalent Schema
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted
  - convert
```
public static Type convert(org.apache.spark.sql.types.DataType sparkType)
```
    Convert a Spark struct to a Type with new field ids.
    This conversion assigns fresh ids.
    Some data types are represented as the same Spark type. These are converted to a default type.
    To convert using a reference schema for field ids and ambiguous types, use convert(Schema, StructType).
    
    Parameters:
    
    sparkType - a Spark DataType
    
    Returns:
    
    the equivalent Type
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted
  - convert
```
public static Schema convert(Schema baseSchema,
                             org.apache.spark.sql.types.StructType sparkType)
```
    Convert a Spark struct to a Schema based on the given schema.
    This conversion does not assign new ids; it uses ids from the base schema.
    Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
    
    Parameters:
    
    baseSchema - a Schema on which conversion is based
    
    sparkType - a Spark StructType
    
    Returns:
    
    the equivalent Schema
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted or there are missing ids
  - convert
```
public static Schema convert(Schema baseSchema,
                             org.apache.spark.sql.types.StructType sparkType,
                             boolean caseSensitive)
```
    Convert a Spark struct to a Schema based on the given schema.
    This conversion does not assign new ids; it uses ids from the base schema.
    Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
    
    Parameters:
    
    baseSchema - a Schema on which conversion is based
    
    sparkType - a Spark StructType
    
    caseSensitive - when false, the case of schema fields is ignored
    
    Returns:
    
    the equivalent Schema
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted or there are missing ids
  - convertWithFreshIds
```
public static Schema convertWithFreshIds(Schema baseSchema,
                                         org.apache.spark.sql.types.StructType sparkType)
```
    Convert a Spark struct to a Schema based on the given schema.
    This conversion will assign new ids for fields that are not found in the base schema.
    Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
    
    Parameters:
    
    baseSchema - a Schema on which conversion is based
    
    sparkType - a Spark StructType
    
    Returns:
    
    the equivalent Schema
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted or there are missing ids
  - convertWithFreshIds
```
public static Schema convertWithFreshIds(Schema baseSchema,
                                         org.apache.spark.sql.types.StructType sparkType,
                                         boolean caseSensitive)
```
    Convert a Spark struct to a Schema based on the given schema.
    This conversion will assign new ids for fields that are not found in the base schema.
    Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
    
    Parameters:
    
    baseSchema - a Schema on which conversion is based
    
    sparkType - a Spark StructType
    
    caseSensitive - when false, case of field names in schema is ignored
    
    Returns:
    
    the equivalent Schema
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted or there are missing ids
  - prune
```
public static Schema prune(Schema schema,
                           org.apache.spark.sql.types.StructType requestedType)
```
    Prune columns from a Schema using a Spark type projection.
    This requires that the Spark type is a projection of the Schema. Nullability and types must match.
    
    Parameters:
    
    schema - a Schema
    
    requestedType - a projection of the Spark representation of the Schema
    
    Returns:
    
    a Schema corresponding to the Spark projection
    
    Throws:
    
    java.lang.IllegalArgumentException - if the Spark type does not match the Schema
  - prune
```
public static Schema prune(Schema schema,
                           org.apache.spark.sql.types.StructType requestedType,
                           java.util.List<Expression> filters)
```
    Prune columns from a Schema using a Spark type projection.
    This requires that the Spark type is a projection of the Schema. Nullability and types must match.
    The filters list of Expression is used to ensure that columns referenced by filters are projected.
    
    Parameters:
    
    schema - a Schema
    
    requestedType - a projection of the Spark representation of the Schema
    
    filters - a list of filters
    
    Returns:
    
    a Schema corresponding to the Spark projection
    
    Throws:
    
    java.lang.IllegalArgumentException - if the Spark type does not match the Schema
  - prune
```
public static Schema prune(Schema schema,
                           org.apache.spark.sql.types.StructType requestedType,
                           Expression filter,
                           boolean caseSensitive)
```
    Prune columns from a Schema using a Spark type projection.
    This requires that the Spark type is a projection of the Schema. Nullability and types must match.
    The filters list of Expression is used to ensure that columns referenced by filters are projected.
    
    Parameters:
    
    schema - a Schema
    
    requestedType - a projection of the Spark representation of the Schema
    
    filter - a filters
    
    Returns:
    
    a Schema corresponding to the Spark projection
    
    Throws:
    
    java.lang.IllegalArgumentException - if the Spark type does not match the Schema
  - estimateSize
```
public static long estimateSize(org.apache.spark.sql.types.StructType tableSchema,
                                long totalRecords)
```
    Estimate approximate table size based on Spark schema and total records.
    
    Parameters:
    
    tableSchema - Spark schema
    
    totalRecords - total records in the table
    
    Returns:
    
    approximate size based on table schema
  - validateMetadataColumnReferences
```
public static void validateMetadataColumnReferences(Schema tableSchema,
                                                    Schema readSchema)
```
  - indexQuotedNameById
```
public static java.util.Map<java.lang.Integer,java.lang.String> indexQuotedNameById(Schema schema)
```

Class SparkSchemaUtil

Method Summary

Methods inherited from class java.lang.Object

Method Detail

schemaForTable

specForTable

convert

convert

convert

convert

convert

convert

convertWithFreshIds

convertWithFreshIds

prune

prune

prune

estimateSize

validateMetadataColumnReferences

indexQuotedNameById