SparkTableUtil

java.lang.Object
- org.apache.iceberg.spark.SparkTableUtil

```
public class SparkTableUtil
extends java.lang.Object
```
Java version of the original SparkTableUtil.scala https://github.com/apache/iceberg/blob/apache-iceberg-0.8.0-incubating/spark/src/main/scala/org/apache/iceberg/spark/SparkTableUtil.scala

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class SparkTableUtil.SparkPartition
Class representing a table partition.

Nested Classes
Modifier and Type	Class and Description
`static class`	`SparkTableUtil.SparkPartition` Class representing a table partition.

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static java.lang.String`	`determineWriteBranch(org.apache.spark.sql.SparkSession spark, java.lang.String branch)` Determine the write branch.
`static java.util.List<SparkTableUtil.SparkPartition>`	`filterPartitions(java.util.List<SparkTableUtil.SparkPartition> partitions, java.util.Map<java.lang.String,java.lang.String> partitionFilter)`
`static java.util.List<SparkTableUtil.SparkPartition>`	`getPartitions(org.apache.spark.sql.SparkSession spark, java.lang.String table)` Returns all partitions in the table.
`static java.util.List<SparkTableUtil.SparkPartition>`	`getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, java.util.Map<java.lang.String,java.lang.String> partitionFilter)` Returns all partitions in the table.
`static java.util.List<SparkTableUtil.SparkPartition>`	`getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, java.lang.String table, java.lang.String predicate)` Returns partitions that match the specified 'predicate'.
`static java.util.List<SparkTableUtil.SparkPartition>`	`getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, org.apache.spark.sql.catalyst.expressions.Expression predicateExpr)` Returns partitions that match the specified 'predicate'.
`static void`	`importSparkPartitions(org.apache.spark.sql.SparkSession spark, java.util.List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, java.lang.String stagingDir)` Import files from given partitions to an Iceberg table.
`static void`	`importSparkPartitions(org.apache.spark.sql.SparkSession spark, java.util.List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, java.lang.String stagingDir, boolean checkDuplicateFiles)` Import files from given partitions to an Iceberg table.
`static void`	`importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, java.lang.String stagingDir)` Import files from an existing Spark table to an Iceberg table.
`static void`	`importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, java.lang.String stagingDir, boolean checkDuplicateFiles)` Import files from an existing Spark table to an Iceberg table.
`static void`	`importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, java.lang.String stagingDir, java.util.Map<java.lang.String,java.lang.String> partitionFilter, boolean checkDuplicateFiles)` Import files from an existing Spark table to an Iceberg table.
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadMetadataTable(org.apache.spark.sql.SparkSession spark, Table table, MetadataTableType type)`
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadMetadataTable(org.apache.spark.sql.SparkSession spark, Table table, MetadataTableType type, java.util.Map<java.lang.String,java.lang.String> extraOptions)`
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`partitionDF(org.apache.spark.sql.SparkSession spark, java.lang.String table)` Returns a DataFrame with a row for each partition in the table.
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`partitionDFByFilter(org.apache.spark.sql.SparkSession spark, java.lang.String table, java.lang.String expression)` Returns a DataFrame with a row for each partition that matches the specified 'expression'.
`static boolean`	`wapEnabled(Table table)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Detail

partitionDF
```
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> partitionDF(org.apache.spark.sql.SparkSession spark,
                                                                                 java.lang.String table)
```
Returns a DataFrame with a row for each partition in the table.
The DataFrame has 3 columns, partition key (a=1/b=2), partition location, and format (avro or parquet).

Parameters:

spark - a Spark session

table - a table name and (optional) database

Returns:

a DataFrame of the table's partitions

partitionDFByFilter

public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> partitionDFByFilter(org.apache.spark.sql.SparkSession spark,
                                                                                         java.lang.String table,
                                                                                         java.lang.String expression)

Returns a DataFrame with a row for each partition that matches the specified 'expression'.

Parameters:: spark - a Spark session.; table - name of the table.; expression - The expression whose matching partitions are returned.
Returns:: a DataFrame of the table partitions.

getPartitions

public static java.util.List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark,
                                                                          java.lang.String table)

Returns all partitions in the table.

Parameters:: spark - a Spark session; table - a table name and (optional) database
Returns:: all table's partitions

getPartitions

public static java.util.List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark,
                                                                          org.apache.spark.sql.catalyst.TableIdentifier tableIdent,
                                                                          java.util.Map<java.lang.String,java.lang.String> partitionFilter)

Returns all partitions in the table.

Parameters:: spark - a Spark session; tableIdent - a table identifier; partitionFilter - partition filter, or null if no filter
Returns:: all table's partitions

getPartitionsByFilter

public static java.util.List<SparkTableUtil.SparkPartition> getPartitionsByFilter(org.apache.spark.sql.SparkSession spark,
                                                                                  java.lang.String table,
                                                                                  java.lang.String predicate)

Returns partitions that match the specified 'predicate'.

Parameters:: spark - a Spark session; table - a table name and (optional) database; predicate - a predicate on partition columns
Returns:: matching table's partitions

getPartitionsByFilter

public static java.util.List<SparkTableUtil.SparkPartition> getPartitionsByFilter(org.apache.spark.sql.SparkSession spark,
                                                                                  org.apache.spark.sql.catalyst.TableIdentifier tableIdent,
                                                                                  org.apache.spark.sql.catalyst.expressions.Expression predicateExpr)

Returns partitions that match the specified 'predicate'.

Parameters:: spark - a Spark session; tableIdent - a table identifier; predicateExpr - a predicate expression on partition columns
Returns:: matching table's partitions

importSparkTable
```
public static void importSparkTable(org.apache.spark.sql.SparkSession spark,
                                    org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent,
                                    Table targetTable,
                                    java.lang.String stagingDir,
                                    java.util.Map<java.lang.String,java.lang.String> partitionFilter,
                                    boolean checkDuplicateFiles)
```
Import files from an existing Spark table to an Iceberg table.
The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.

Parameters:

spark - a Spark session

sourceTableIdent - an identifier of the source Spark table

targetTable - an Iceberg table where to import the data

stagingDir - a staging directory to store temporary manifest files

partitionFilter - only import partitions whose values match those in the map, can be partially defined

checkDuplicateFiles - if true, throw exception if import results in a duplicate data file

importSparkTable
```
public static void importSparkTable(org.apache.spark.sql.SparkSession spark,
                                    org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent,
                                    Table targetTable,
                                    java.lang.String stagingDir,
                                    boolean checkDuplicateFiles)
```
Import files from an existing Spark table to an Iceberg table.
The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.

Parameters:

spark - a Spark session

sourceTableIdent - an identifier of the source Spark table

targetTable - an Iceberg table where to import the data

stagingDir - a staging directory to store temporary manifest files

checkDuplicateFiles - if true, throw exception if import results in a duplicate data file

importSparkTable
```
public static void importSparkTable(org.apache.spark.sql.SparkSession spark,
                                    org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent,
                                    Table targetTable,
                                    java.lang.String stagingDir)
```
Import files from an existing Spark table to an Iceberg table.
The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.

Parameters:

spark - a Spark session

sourceTableIdent - an identifier of the source Spark table

targetTable - an Iceberg table where to import the data

stagingDir - a staging directory to store temporary manifest files

importSparkPartitions

public static void importSparkPartitions(org.apache.spark.sql.SparkSession spark,
                                         java.util.List<SparkTableUtil.SparkPartition> partitions,
                                         Table targetTable,
                                         PartitionSpec spec,
                                         java.lang.String stagingDir,
                                         boolean checkDuplicateFiles)

Import files from given partitions to an Iceberg table.

Parameters:: spark - a Spark session; partitions - partitions to import; targetTable - an Iceberg table where to import the data; spec - a partition spec; stagingDir - a staging directory to store temporary manifest files; checkDuplicateFiles - if true, throw exception if import results in a duplicate data file

importSparkPartitions

public static void importSparkPartitions(org.apache.spark.sql.SparkSession spark,
                                         java.util.List<SparkTableUtil.SparkPartition> partitions,
                                         Table targetTable,
                                         PartitionSpec spec,
                                         java.lang.String stagingDir)

Import files from given partitions to an Iceberg table.

Parameters:: spark - a Spark session; partitions - partitions to import; targetTable - an Iceberg table where to import the data; spec - a partition spec; stagingDir - a staging directory to store temporary manifest files

filterPartitions

public static java.util.List<SparkTableUtil.SparkPartition> filterPartitions(java.util.List<SparkTableUtil.SparkPartition> partitions,
                                                                             java.util.Map<java.lang.String,java.lang.String> partitionFilter)

loadMetadataTable

public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(org.apache.spark.sql.SparkSession spark,
                                                                                       Table table,
                                                                                       MetadataTableType type)

loadMetadataTable

public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(org.apache.spark.sql.SparkSession spark,
                                                                                       Table table,
                                                                                       MetadataTableType type,
                                                                                       java.util.Map<java.lang.String,java.lang.String> extraOptions)

determineWriteBranch
```
public static java.lang.String determineWriteBranch(org.apache.spark.sql.SparkSession spark,
                                                    java.lang.String branch)
```
Determine the write branch.
Validate wap config and determine the write branch.

Parameters:

spark - a Spark Session

branch - write branch if there is no WAP branch configured

Returns:

branch for write operation

wapEnabled

public static boolean wapEnabled(Table table)

Class SparkTableUtil

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Detail

partitionDF

partitionDFByFilter

getPartitions

getPartitions

getPartitionsByFilter

getPartitionsByFilter

importSparkTable

importSparkTable

importSparkTable

importSparkPartitions

importSparkPartitions

filterPartitions

loadMetadataTable

loadMetadataTable

determineWriteBranch

wapEnabled