DeleteReachableFilesSparkAction

java.lang.Object
- org.apache.iceberg.spark.actions.DeleteReachableFilesSparkAction

All Implemented Interfaces:

Action<DeleteReachableFiles,DeleteReachableFiles.Result>, DeleteReachableFiles
```
public class DeleteReachableFilesSparkAction
extends java.lang.Object
implements DeleteReachableFiles
```
An implementation of DeleteReachableFiles that uses metadata tables in Spark to determine which files should be deleted.

Nested Class Summary
- Nested classes/interfaces inherited from interface org.apache.iceberg.actions.DeleteReachableFiles
  DeleteReachableFiles.Result

Field Summary

Fields
Modifier and Type	Field and Description
`protected static org.apache.iceberg.relocated.com.google.common.base.Joiner`	`COMMA_JOINER`
`protected static org.apache.iceberg.relocated.com.google.common.base.Splitter`	`COMMA_SPLITTER`
`protected static java.lang.String`	`FILE_PATH`
`protected static java.lang.String`	`LAST_MODIFIED`
`protected static java.lang.String`	`MANIFEST`
`protected static java.lang.String`	`MANIFEST_LIST`
`protected static java.lang.String`	`OTHERS`
`protected static java.lang.String`	`STATISTICS_FILES`
`static java.lang.String`	`STREAM_RESULTS`
`static boolean`	`STREAM_RESULTS_DEFAULT`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected org.apache.spark.sql.Dataset<FileInfo>`	`allReachableOtherMetadataFileDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`contentFileDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary`	`deleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)` Deletes files and keeps track of how many files were removed for each file type.
`protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary`	`deleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files)`
`DeleteReachableFilesSparkAction`	`deleteWith(java.util.function.Consumer<java.lang.String> newDeleteFunc)` Passes an alternative delete implementation that will be used for files.
`DeleteReachableFiles.Result`	`execute()` Executes this action.
`DeleteReachableFilesSparkAction`	`executeDeleteWith(java.util.concurrent.ExecutorService executorService)` Passes an alternative executor service that will be used for files removal.
`DeleteReachableFilesSparkAction`	`io(FileIO fileIO)` Set the `FileIO` to be used for files removal
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadMetadataTable(Table table, MetadataTableType type)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestListDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected JobGroupInfo`	`newJobGroupInfo(java.lang.String groupId, java.lang.String desc)`
`protected Table`	`newStaticTable(TableMetadata metadata, FileIO io)`
`ThisT`	`option(java.lang.String name, java.lang.String value)`
`protected java.util.Map<java.lang.String,java.lang.String>`	`options()`
`ThisT`	`options(java.util.Map<java.lang.String,java.lang.String> newOptions)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`otherMetadataFileDS(Table table)`
`protected DeleteReachableFilesSparkAction`	`self()`
`protected org.apache.spark.sql.SparkSession`	`spark()`
`protected org.apache.spark.api.java.JavaSparkContext`	`sparkContext()`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected <T> T`	`withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.iceberg.actions.Action
option, options

Field Detail

STREAM_RESULTS

public static final java.lang.String STREAM_RESULTS

See Also:: Constant Field Values

STREAM_RESULTS_DEFAULT
```
public static final boolean STREAM_RESULTS_DEFAULT
```
See Also:

Constant Field Values

MANIFEST

protected static final java.lang.String MANIFEST

See Also:: Constant Field Values

MANIFEST_LIST

protected static final java.lang.String MANIFEST_LIST

See Also:: Constant Field Values

STATISTICS_FILES

protected static final java.lang.String STATISTICS_FILES

See Also:: Constant Field Values

OTHERS

protected static final java.lang.String OTHERS

See Also:: Constant Field Values

FILE_PATH

protected static final java.lang.String FILE_PATH

See Also:: Constant Field Values

LAST_MODIFIED

protected static final java.lang.String LAST_MODIFIED

See Also:: Constant Field Values

COMMA_SPLITTER

protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER

COMMA_JOINER

protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER

Method Detail

self

protected DeleteReachableFilesSparkAction self()

io
```
public DeleteReachableFilesSparkAction io(FileIO fileIO)
```
Description copied from interface: DeleteReachableFiles

Set the FileIO to be used for files removal

Specified by:

io in interface DeleteReachableFiles

Parameters:

fileIO - FileIO to use for files removal

Returns:

this for method chaining

deleteWith
```
public DeleteReachableFilesSparkAction deleteWith(java.util.function.Consumer<java.lang.String> newDeleteFunc)
```
Description copied from interface: DeleteReachableFiles

Passes an alternative delete implementation that will be used for files.

Specified by:

deleteWith in interface DeleteReachableFiles

Parameters:

newDeleteFunc - a function that will be called to delete files. The function accepts path to file as an argument.

Returns:

this for method chaining

executeDeleteWith
```
public DeleteReachableFilesSparkAction executeDeleteWith(java.util.concurrent.ExecutorService executorService)
```
Description copied from interface: DeleteReachableFiles

Passes an alternative executor service that will be used for files removal. This service will only be used if a custom delete function is provided by DeleteReachableFiles.deleteWith(Consumer) or if the FileIO does not support bulk deletes. Otherwise, parallelism should be controlled by the IO specific deleteFiles method.

Specified by:

executeDeleteWith in interface DeleteReachableFiles

Parameters:

executorService - the service to use

Returns:

this for method chaining

execute
```
public DeleteReachableFiles.Result execute()
```
Description copied from interface: Action

Executes this action.

Specified by:

execute in interface Action<DeleteReachableFiles,DeleteReachableFiles.Result>

Returns:

the result of this action

spark

protected org.apache.spark.sql.SparkSession spark()

sparkContext

protected org.apache.spark.api.java.JavaSparkContext sparkContext()

option

public ThisT option(java.lang.String name,
                    java.lang.String value)

options

public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)

options

protected java.util.Map<java.lang.String,java.lang.String> options()

withJobGroupInfo

protected <T> T withJobGroupInfo(JobGroupInfo info,
                                 java.util.function.Supplier<T> supplier)

newJobGroupInfo

protected JobGroupInfo newJobGroupInfo(java.lang.String groupId,
                                       java.lang.String desc)

newStaticTable

protected Table newStaticTable(TableMetadata metadata,
                               FileIO io)

contentFileDS

protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table)

contentFileDS

protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table,
                                                               java.util.Set<java.lang.Long> snapshotIds)

manifestDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table)

manifestDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table,
                                                            java.util.Set<java.lang.Long> snapshotIds)

manifestListDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table)

manifestListDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table,
                                                                java.util.Set<java.lang.Long> snapshotIds)

statisticsFileDS

protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table,
                                                                  java.util.Set<java.lang.Long> snapshotIds)

otherMetadataFileDS

protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table)

allReachableOtherMetadataFileDS

protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table)

loadMetadataTable

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table,
                                                                                   MetadataTableType type)

deleteFiles

protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(java.util.concurrent.ExecutorService executorService,
                                                                                     java.util.function.Consumer<java.lang.String> deleteFunc,
                                                                                     java.util.Iterator<FileInfo> files)

Deletes files and keeps track of how many files were removed for each file type.

Parameters:: executorService - an executor service to use for parallel deletes; deleteFunc - a delete func; files - an iterator of Spark rows of the structure (path: String, type: String)
Returns:: stats on which files were deleted

deleteFiles

protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io,
                                                                                     java.util.Iterator<FileInfo> files)

Class DeleteReachableFilesSparkAction

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.iceberg.actions.DeleteReachableFiles

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.iceberg.actions.Action

Field Detail

STREAM_RESULTS

STREAM_RESULTS_DEFAULT

MANIFEST

MANIFEST_LIST

STATISTICS_FILES

OTHERS

FILE_PATH

LAST_MODIFIED

COMMA_SPLITTER

COMMA_JOINER

Method Detail

self

io

deleteWith

executeDeleteWith

execute

spark

sparkContext

option

options

options

withJobGroupInfo

newJobGroupInfo

newStaticTable

contentFileDS

contentFileDS

manifestDS

manifestDS

manifestListDS

manifestListDS

statisticsFileDS

otherMetadataFileDS

allReachableOtherMetadataFileDS

loadMetadataTable

deleteFiles

deleteFiles