RewriteDataFilesSparkAction

java.lang.Object
- org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction

All Implemented Interfaces:

Action<RewriteDataFiles,RewriteDataFiles.Result>, RewriteDataFiles, SnapshotUpdate<RewriteDataFiles,RewriteDataFiles.Result>
```
public class RewriteDataFilesSparkAction
extends java.lang.Object
implements RewriteDataFiles
```

Nested Class Summary
- Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteDataFiles
  RewriteDataFiles.FileGroupFailureResult, RewriteDataFiles.FileGroupInfo, RewriteDataFiles.FileGroupRewriteResult, RewriteDataFiles.Result

Field Summary

Fields
Modifier and Type	Field and Description
`protected static org.apache.iceberg.relocated.com.google.common.base.Joiner`	`COMMA_JOINER`
`protected static org.apache.iceberg.relocated.com.google.common.base.Splitter`	`COMMA_SPLITTER`
`protected static java.lang.String`	`FILE_PATH`
`protected static java.lang.String`	`LAST_MODIFIED`
`protected static java.lang.String`	`MANIFEST`
`protected static java.lang.String`	`MANIFEST_LIST`
`protected static java.lang.String`	`OTHERS`
`protected static java.lang.String`	`STATISTICS_FILES`

Fields inherited from interface org.apache.iceberg.actions.RewriteDataFiles
MAX_CONCURRENT_FILE_GROUP_REWRITES, MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT, MAX_FILE_GROUP_SIZE_BYTES, MAX_FILE_GROUP_SIZE_BYTES_DEFAULT, PARTIAL_PROGRESS_ENABLED, PARTIAL_PROGRESS_ENABLED_DEFAULT, PARTIAL_PROGRESS_MAX_COMMITS, PARTIAL_PROGRESS_MAX_COMMITS_DEFAULT, REWRITE_JOB_ORDER, REWRITE_JOB_ORDER_DEFAULT, TARGET_FILE_SIZE_BYTES, USE_STARTING_SEQUENCE_NUMBER, USE_STARTING_SEQUENCE_NUMBER_DEFAULT

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected org.apache.spark.sql.Dataset<FileInfo>`	`allReachableOtherMetadataFileDS(Table table)`
`RewriteDataFilesSparkAction`	`binPack()` Choose BINPACK as a strategy for this rewrite operation
`protected void`	`commit(SnapshotUpdate<?> update)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`contentFileDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary`	`deleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)` Deletes files and keeps track of how many files were removed for each file type.
`protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary`	`deleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files)`
`RewriteDataFiles.Result`	`execute()` Executes this action.
`RewriteDataFilesSparkAction`	`filter(Expression expression)` A user provided filter for determining which files will be considered by the rewrite strategy.
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadMetadataTable(Table table, MetadataTableType type)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestListDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected JobGroupInfo`	`newJobGroupInfo(java.lang.String groupId, java.lang.String desc)`
`protected Table`	`newStaticTable(TableMetadata metadata, FileIO io)`
`ThisT`	`option(java.lang.String name, java.lang.String value)`
`protected java.util.Map<java.lang.String,java.lang.String>`	`options()`
`ThisT`	`options(java.util.Map<java.lang.String,java.lang.String> newOptions)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`otherMetadataFileDS(Table table)`
`protected RewriteDataFilesSparkAction`	`self()`
`ThisT`	`snapshotProperty(java.lang.String property, java.lang.String value)`
`RewriteDataFilesSparkAction`	`sort()` Choose SORT as a strategy for this rewrite operation using the table's sortOrder
`RewriteDataFilesSparkAction`	`sort(SortOrder sortOrder)` Choose SORT as a strategy for this rewrite operation and manually specify the sortOrder to use
`protected org.apache.spark.sql.SparkSession`	`spark()`
`protected org.apache.spark.api.java.JavaSparkContext`	`sparkContext()`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected <T> T`	`withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)`
`RewriteDataFilesSparkAction`	`zOrder(java.lang.String... columnNames)` Choose Z-ORDER as a strategy for this rewrite operation with a specified list of columns to use

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdate
snapshotProperty

Methods inherited from interface org.apache.iceberg.actions.Action
option, options

Field Detail

MANIFEST

protected static final java.lang.String MANIFEST

See Also:: Constant Field Values

MANIFEST_LIST

protected static final java.lang.String MANIFEST_LIST

See Also:: Constant Field Values

STATISTICS_FILES

protected static final java.lang.String STATISTICS_FILES

See Also:: Constant Field Values

OTHERS

protected static final java.lang.String OTHERS

See Also:: Constant Field Values

FILE_PATH

protected static final java.lang.String FILE_PATH

See Also:: Constant Field Values

LAST_MODIFIED

protected static final java.lang.String LAST_MODIFIED

See Also:: Constant Field Values

COMMA_SPLITTER

protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER

COMMA_JOINER

protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER

Method Detail

self

protected RewriteDataFilesSparkAction self()

binPack
```
public RewriteDataFilesSparkAction binPack()
```
Description copied from interface: RewriteDataFiles

Choose BINPACK as a strategy for this rewrite operation

Specified by:

binPack in interface RewriteDataFiles

Returns:

this for method chaining

sort
```
public RewriteDataFilesSparkAction sort(SortOrder sortOrder)
```
Description copied from interface: RewriteDataFiles

Choose SORT as a strategy for this rewrite operation and manually specify the sortOrder to use

Specified by:

sort in interface RewriteDataFiles

Parameters:

sortOrder - user defined sortOrder

Returns:

this for method chaining

sort
```
public RewriteDataFilesSparkAction sort()
```
Description copied from interface: RewriteDataFiles

Choose SORT as a strategy for this rewrite operation using the table's sortOrder

Specified by:

sort in interface RewriteDataFiles

Returns:

this for method chaining

zOrder
```
public RewriteDataFilesSparkAction zOrder(java.lang.String... columnNames)
```
Description copied from interface: RewriteDataFiles

Choose Z-ORDER as a strategy for this rewrite operation with a specified list of columns to use

Specified by:

zOrder in interface RewriteDataFiles

Parameters:

columnNames - Columns to be used to generate Z-Values

Returns:

this for method chaining

filter
```
public RewriteDataFilesSparkAction filter(Expression expression)
```
Description copied from interface: RewriteDataFiles

A user provided filter for determining which files will be considered by the rewrite strategy. This will be used in addition to whatever rules the rewrite strategy generates. For example this would be used for providing a restriction to only run rewrite on a specific partition.

Specified by:

filter in interface RewriteDataFiles

Parameters:

expression - An iceberg expression used to determine which files will be considered for rewriting

Returns:

this for chaining

execute
```
public RewriteDataFiles.Result execute()
```
Description copied from interface: Action

Executes this action.

Specified by:

execute in interface Action<RewriteDataFiles,RewriteDataFiles.Result>

Returns:

the result of this action

snapshotProperty

public ThisT snapshotProperty(java.lang.String property,
                              java.lang.String value)

commit

protected void commit(SnapshotUpdate<?> update)

spark

protected org.apache.spark.sql.SparkSession spark()

sparkContext

protected org.apache.spark.api.java.JavaSparkContext sparkContext()

option

public ThisT option(java.lang.String name,
                    java.lang.String value)

options

public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)

options

protected java.util.Map<java.lang.String,java.lang.String> options()

withJobGroupInfo

protected <T> T withJobGroupInfo(JobGroupInfo info,
                                 java.util.function.Supplier<T> supplier)

newJobGroupInfo

protected JobGroupInfo newJobGroupInfo(java.lang.String groupId,
                                       java.lang.String desc)

newStaticTable

protected Table newStaticTable(TableMetadata metadata,
                               FileIO io)

contentFileDS

protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table)

contentFileDS

protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table,
                                                               java.util.Set<java.lang.Long> snapshotIds)

manifestDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table)

manifestDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table,
                                                            java.util.Set<java.lang.Long> snapshotIds)

manifestListDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table)

manifestListDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table,
                                                                java.util.Set<java.lang.Long> snapshotIds)

statisticsFileDS

protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table,
                                                                  java.util.Set<java.lang.Long> snapshotIds)

otherMetadataFileDS

protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table)

allReachableOtherMetadataFileDS

protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table)

loadMetadataTable

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table,
                                                                                   MetadataTableType type)

deleteFiles

protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(java.util.concurrent.ExecutorService executorService,
                                                                                     java.util.function.Consumer<java.lang.String> deleteFunc,
                                                                                     java.util.Iterator<FileInfo> files)

Deletes files and keeps track of how many files were removed for each file type.

Parameters:: executorService - an executor service to use for parallel deletes; deleteFunc - a delete func; files - an iterator of Spark rows of the structure (path: String, type: String)
Returns:: stats on which files were deleted

deleteFiles

protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io,
                                                                                     java.util.Iterator<FileInfo> files)

Class RewriteDataFilesSparkAction

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteDataFiles

Field Summary

Fields inherited from interface org.apache.iceberg.actions.RewriteDataFiles

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdate

Methods inherited from interface org.apache.iceberg.actions.Action

Field Detail

MANIFEST

MANIFEST_LIST

STATISTICS_FILES

OTHERS

FILE_PATH

LAST_MODIFIED

COMMA_SPLITTER

COMMA_JOINER

Method Detail

self

binPack

sort

sort

zOrder

filter

execute

snapshotProperty

commit

spark

sparkContext

option

options

options

withJobGroupInfo

newJobGroupInfo

newStaticTable

contentFileDS

contentFileDS

manifestDS

manifestDS

manifestListDS

manifestListDS

statisticsFileDS

otherMetadataFileDS

allReachableOtherMetadataFileDS

loadMetadataTable

deleteFiles

deleteFiles