public class Partitioning
extends java.lang.Object
Modifier and Type | Method and Description |
---|---|
static Types.StructType |
groupingKeyType(Schema schema,
java.util.Collection<PartitionSpec> specs)
Builds a grouping key type considering the provided schema and specs.
|
static boolean |
hasBucketField(PartitionSpec spec)
Check whether the spec contains a bucketed partition field.
|
static Types.StructType |
partitionType(Table table)
Builds a unified partition type considering all specs in a table.
|
static SortOrder |
sortOrderFor(PartitionSpec spec)
Create a sort order that will group data for a partition spec.
|
public static boolean hasBucketField(PartitionSpec spec)
spec
- a partition specpublic static SortOrder sortOrderFor(PartitionSpec spec)
If the partition spec contains bucket columns, the sort order will also have a field to sort by a column that is bucketed in the spec. The column is selected by the highest number of buckets in the transform.
spec
- a partition specpublic static Types.StructType groupingKeyType(Schema schema, java.util.Collection<PartitionSpec> specs)
A grouping key defines how data is split between files and consists of partition fields with non-void transforms that are present in each provided spec. Iceberg guarantees that records with different values for the grouping key are disjoint and are stored in separate files.
If there is only one spec, the grouping key will include all partition fields with non-void transforms from that spec. Whenever there are multiple specs, the grouping key will represent an intersection of all partition fields with non-void transforms. If a partition field is present only in a subset of specs, Iceberg cannot guarantee data distribution on that field. That's why it will not be part of the grouping key. Unpartitioned tables or tables with non-overlapping specs have empty grouping keys.
When partition fields are dropped in v1 tables, they are replaced with new partition fields that have the same field ID but use a void transform under the hood. Such fields cannot be part of the grouping key as void transforms always return null.
If the provided schema is not null, this method will only take into account partition fields on top of columns present in the schema. Otherwise, all partition fields will be considered.
schema
- a schema specifying a set of source columns to consider (null to consider all)specs
- one or many specspublic static Types.StructType partitionType(Table table)
If there is only one spec, the partition type is that spec's partition type. Whenever there are multiple specs, the partition type is a struct containing all fields that have ever been a part of any spec in the table. In other words, the struct fields represent a union of all known partition fields.
table
- a table with one or many specs